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INTRODUCTION 



(a) 



(b) 



(c) 



The top quark is the most massive known elementary 
particle. Its mass, mt, is 173.3 ±1.1 GeV/c^ l], about 
forty times larger than that of the bottom quark, the 
second-most massive standard model (SM) fermion. The 
top quark's large mass, at the scale of electroweak sym- 
metry breaking, hints that it may play a role in the mech- 
anism of mass generation. The presence of the top quark 
was established in 1995 by the CDF and DO collabora- 
tions with approximately 60 pb~^ of pp data collected 
per collaboration at =1.8 TeV 0, Q in Run I at the 
Fermilab Tevatron. The production mechanism used in 
the observation of the top quark was tt pair production 
via the strong interaction. 

Since then, larger data samples have enabled detailed 
study of the top quark. The tt production cross sec- 
tion the top quark's mass [l|, the top quark decay 
branching fraction to Wb [3, and the polarization of W 
bosons in top quark decay [6| have been measured pre- 
cisely. Nonetheless, many properties of the top quark 
have not yet been tested as precisely. In particular, the 
Cabibbo-Kobayashi-Maskawa (CKM) matrix element Vtb 
remains poorly constrained by direct measurements 0. 
The strength of the coupling, |Vth|, governs the decay 
rate of the top quark and its decay width into Wb; other 
decays are expected to have much smaller branching frac- 
tions. Using measurements of the other CKM matrix el- 
ements, and assuming a three-generation SM with a 3 x 3 
unitary CKM matrix, |Vtf,| is expected to be very close 
to unity. 

Top quarks are also expected to be produced singly 
in pp collisions via weak, charged-current interactions. 
The dominant processes at the Tevatron are the s- 
channel process, shown in Fig. [Tfa), and the f-channel 
process 8], shown in Fig. [ijb). The next-to- leading- 
order (NLO) cross sections for these two processes are 
as= 0.88 ± 0.11 pb and at= 1.98 ± 0.25 pb, respec- 
tively d, [l3| • This cross section is the sum of the sin- 
gle t and the single t predictions. Throughout this pa- 
per, charge conjugate states are implied; all cross sec- 
tions and yields are shown summed over charge conju- 
gate states. A calculation has been performed resum- 
ming soft gluon corrections and calculating finite-order 
expansions through next-to-next-to-next-to-leading or- 
der (NNNLO) [ul, yielding cr^= 0.98 ± 0.04 pb and 
at= 2.16 ± 0.12 pb, also assuming mt = 175 GeV/c^. 
Newer calculations are also available |13 - fl4{ . A third 
process, the associated production of a Vl^ boson and a 
top quark, shown in Fig.[TI^c), has a very small expected 
cross section at the Tevatron. 

Measuring the two cross sections cTs and at provides a 
direct determination of |Vtf,|, allowing an overconstrained 
test of the unitarity of the CKM matrix, as well as an 
indirect determination of the top quark's lifetime. We as- 
sume that the top quark decays to Wb 100% of the time 
in order to measure the production cross sections. This 
assumption does not constrain |Vtfc| to be near unity, but 





FIG. 1: Representative Feynman diagrams of single top quark 
production. Figures (a) and (b) are s- and t-channel pro- 
cesses, respectively, while figure (c) is associated Wt produc- 
tion, which contributes a small amount to the expected cross 
section at the Tevatron. 



instead it is the same as assuming |Vtbp ^ |^sP + l^tdP- 
Many extensions to the SM predict measurable devia- 
tions of as or at from their SM values. One of the sim- 
plest of these is the hypothesis that a fourth generation of 
fermions exists beyond the three established ones. Aside 
from the constraint that its neutrino must be heavier 
than AIz / 2 fis'l and that the quarks must escape current 
experimental limits, the existence of a fourth generation 
of fermions remains possible. If these additional sequen- 
tial fermions exist, then a 4 x 4 version of the CKM matrix 
would be unitary, and the 3x3 submatrix may not nec- 
essarily be unitary. The presence of a fourth generation 
would in general reduce |Vtb|, thereby reducing single top 
quark production cross sections ag and at ■ Precision elec- 
troweak constraints provide some information on possible 
values of |Vtb| in this extended scenario but a direct 
measurement provides a test with no additional model 
dependence. 

Other new physics scenarios predict larger values of as 
and at than those expected in the SM. A flavor-changing 
Ztc coupling, for example, would manifest itself in the 
production of pp — > tc events, which may show up in 
either the measured value of as or at depending on the 
relative acceptances of the measurement channels. An 
additional charged gauge boson W' may also enhance 
the production cross sections. A review of new physics 
models affecting the single top quark production cross 
section and polarization properties is given in 17 1. 

Even in the absence of new physics, assuming the SM 
constraints on \Vtb\, a measurement of the i-channel sin- 
gle top production cross section provides a test of the b 
parton distribution function of the proton. 

Single top quark production is one of the background 
processes in the search for the Higgs boson H in the 
WH — )■ Ivbb channel, since they share the same final 
state, and a direct measurement of single top quark pro- 
duction may improve the sensitivity of the Higgs bo- 
son search. Furthermore, the backgrounds to the sin- 
gle top quark search are backgrounds to the Higgs bo- 
son search. Careful understanding of these backgrounds 
lays the groundwork for future Higgs boson searches. 
Since the single top quark processes have larger cross 
sections than the Higgs boson signal in the WH — £i/bb 
mode [isj , and since the single top signal is more distinct 
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from the backgrounds than the Higgs boson signal is, we 
must pass the milestone of observing single top quark 
production along the way to testing for Higgs boson pro- 
duction. 

Measuring the single top quark cross section is well mo- 
tivated but it is also extremely challenging at the Teva- 
tron. The total production cross section is expected to 
be about one-half of that of tt production [l^ , and with 
only one top quark in the final state instead of two, the 
signal is far less distinct from the dominant background 
processes than tt production is. The rate at which a W 
boson is produced along with jets, at least one of which 
must have a displaced vertex which passes our require- 
ments for B hadron identification (we say in this pa- 
per that such jets are ^-tagged) , is approximately twelve 
times the signal rate. The a priori uncertainties on the 
background processes are about a factor of three larger 
than the expected signal rate. In order to expect to ob- 
serve single top quark production, the background rates 
must be small and well constrained, and the expected 
signal must be much larger than the uncertainty on the 
background. A much more pure sample of signal events 
therefore must be separated from the background pro- 
cesses in order to make observation possible. 

Single top quark production is characterized by a num- 
ber of kinematic properties. The top quark mass is 
known, and precise predictions of the distributions of 
observable quantities for the top quark and the recoil 
products are also available. Top quarks produced singly 
via the weak interaction are expected to be nearly 100% 
polarized [lO, HH . The background W^-|-jets and tt pro- 
cesses have characteristics which differ from those of sin- 
gle top quark production. Kinematic properties, coupled 
with the ^-tagging requirement, provide the keys to pu- 
rification of the signal. Because signal events differ from 
background events in several ways, such as in the dis- 
tribution of the invariant mass of the final state objects 
assigned to be the decay products of the top quark and 
the rapidity of the recoiling jets, and because the task 
of observing single top quark production requires the 
maximum separation, we apply multivariate techniques. 
The techniques described in this paper together achieve 
a signal-to-background ratio of more than 5:1 in a subset 
of events with a significant signal expectation. This high 
purity is needed in order to overcome the uncertainty in 
the background prediction. 

The effect of the background uncertainty is reduced 
by fitting for both the signal and the background rates 
together to the observed data distributions, a technique 
which is analogous to fitting the background in the side- 
bands of a mass peak, but which is applied in this case 
to multivariate discriminant distributions. Uncertainties 
are incurred in this procedure - the shapes of the back- 
ground distributions are imperfectly known from simu- 
lations. We check in detail the modeling of the distri- 
butions of the inputs and the outputs of the multivari- 
ate techniques, using events passing our selection require- 
ments, and also separately using events in control sam- 



ples depleted in signal. We also check the modeling of the 
correlations between pairs of these variables. In general 
we find excellent agreement, with some imperfections. 
We assess uncertainties on the shapes of the discrimi- 
nant outputs both from a priori uncertain parameters in 
the modeling, as well as from discrepancies observed in 
the modeling of the data by the Monte Carlo simulations. 
These shape uncertainties are included in the signal rate 
extraction and in the calculation of the significance. 

Both the CDF and the DO Collaborations have 
searched for single top quark production in pp collision 
data taken at ^/s = 1.96 TeV in Run II at the Fer- 
milab Tevatron. The DO Collaboration reported evi- 
dence for the production of single top quarks in 0.9 fb~^ 
of data [23, [2j], and observation of the process in 
2.3 fb~^ [2J]. More recently, DO has conducted a mea- 
surement of the single top production cross section in the 
T-fjets final state using 4.8 fb"! of data The CDF 
Collaboration reported evidence in 2.2 fb~^ of data ^26l | 
and observation in 3.2 fb^^ of data [27|. This paper de- 
scribes in detail the four VF-|-jets analyses of f27'|; the 
analyses are based on multivariate likelihood functions 
(LF), artificial neural networks (NN), matrix elements 
(ME), and boosted decision trees (BDT). These analyses 
select events with a high-pT charged lepton, large missing 
transverse energy ^r^, and two or more jets, at least one 
of which is 6-tagged. Each analysis separately measures 
the single top quark production cross section and calcu- 
lates the significance of the observed excess. We report 
here a single set of results and therefore must combine the 
information from each of the four analyses. Because there 
is 100% overlap in the data and Monte Carlo events se- 
lected by the analyses, a natural combination technique 
is to use the individual analyses' discriminant outputs 
as inputs to a super discriminant function evaluated for 
each event. The distributions of this super discriminant 
are then interpreted in the same way as those of each of 
the four component analyses. 

A separate analysis is conducted on events without an 
identified charged lepton, in a data sample which corre- 
sponds to 2.1 fb~^ of data. Missing transverse energy 
plus jets, one of which is ^-tagged, is the signature used 
for this fifth analysis (MJ), which is described in detail 
in [2^. There is no overlap of events selected by the 
MJ analysis and the W^-fjets analyses. The results of 
this analysis are combined with the results of the su- 
per discriminant analysis to yield the final results: the 
measured total cross section a-g + <7t, \Vtb\, the separate 
cross sections Us and at, and the statistical significance 
of the excess. With the combination of all analyses, we 
observe single top quark production with a significance 
of 5.0 standard deviations. 

The analyses described in this paper were blind to 
the selected data when they were optimized for their ex- 
pected sensitivities. Furthermore, since the publication 
of the 2.2 fb~^ W-|-jets results [2g, the event selection 
requirements, the multivariate discriminants for the anal- 
yses shared with that result, and the systematic uncer- 
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tainties remain unchanged; new data were added without 
further optimization or retraining. When the 2.2 fb^^ 
resuhs were vaUdated, they were done so in a bhnd fash- 
ion. The distributions of aU relevant variables were first 
checked for accurate modeling by our simulations and 
data-based background estimations in control samples of 
data that do not overlap with the selected signal sample. 
Then the distributions of the discriminant input vari- 
ables, and also other variables, were checked in the sam- 
ple of events passing the selection requirements. After 
that, the modeling of the low signal-to-background por- 
tions of the final output histograms was checked. Only 
after all of these validation steps were completed were 
the data in the most sensitive regions revealed. Two new 
analyses, BDT and MJ, have been added for this paper, 
and they were validated in a similar way. 

This paper is organized as follows: Section HIl describes 
the CDF II detector, Section lllll describes the event selec- 
tion, Section ITVl describes the simulation of signal events 
and the acceptance of the signal. Section |V] describes 
the background rate and kinematic shape modeling. Sec- 
tion |Vl] describes a neural-network fiavor separator which 
helps separate b jets from others. Section IVIII describes 
the four VF-l-jets multivariate analysis techniques. Sec- 
tion IVIIII describes the systematic uncertainties we as- 
sess. Section IIXI describes the statistical techniques for 
extraction of the signal cross section and the significance. 
Section |X] describes the super discriminant. Section IXj 
presents our results for the cross section, |Vtfc|, and the 
significance, Section rXIII describes an extraction of CTs and 
at in a joint fit, and Section [XIIII summarizes our results. 



II. THE CDF II DETECTOR 

The CDF II detector [2§-[3l| is a general-purpose par- 
ticle detector with azimuthal and forward-backward sym- 
metry. Positions and angles are expressed in a cylindrical 
coordinate system, with the z axis directed along the pro- 
ton beam. The azimuthal angle (j) around the beam axis 
is defined with respect to a horizontal ray running out- 
wards from the center of the Tevatron, and radii are mea- 
sured with respect to the beam axis. The polar angle 9 
is defined with respect to the proton beam direction, and 
the pseudorapidity rj is defined to be ry = — In [tan(6'/2)]. 
The transverse energy (as measured by the calorimetry) 
and momentum (as measured by the tracking systems) 
of a particle are defined as Et = E sin 9 and pT — p sin 9, 
respectively. Figure [2] shows a cutaway isometric view of 
the CDF II detector. 

A silicon tracking system and an open-cell drift cham- 
ber are used to measure the momenta of charged par- 
ticles. The CDF II silicon tracking system consists of 
three subdetectors: a layer of single-sided silicon mi- 
crostrip detectors, located immediately outside the beam 
pipe (layer 00) [S^l, a five-layer, double-sided silicon mi- 
crostrip detector (SVX II) coveririg the region between 
2.5 to 11 cm from the beam axis |33j . and intermediate 



silicon layers (ISL) [33| located at radii between 19 cm 
and 29 cm which provide linking between track segments 
in the drift chamber and the SVX II. The typical intrinsic 
hit resolution of the silicon detector is 11 //m. The impact 
parameter resolution is a (do) ~ 40 /im, of which approxi- 
mately 35 /xm is due to the transverse size of the Tevatron 
interaction region. The entire system reconstructs tracks 
in three dimensions with the precision needed to identify 
displaced vertices associated with b and c hadron decays. 

The central outer tracker (COT) [s^, the main track- 
ing detector of CDF II, is an open-cell drift chamber, 
3.1 m in length. It is segmented into eight concentric 
superlayers. The drift medium is a mixture of argon and 
ethane. Sense wires are arranged in eight alternating ax- 
ial and ± 2° stereo superlayers with twelve layers of wires 
in each. The active volume covers the radial range from 
40 cm to 137 cm. The tracking efficiency of the COT is 
nearly 100% in the range \ri\ < 1, and with the addition 
of silicon coverage, the tracks can be detected within the 
range \ri\ < 1.8. 

The tracking systems are located within a supercon- 
ducting solenoid, which has a diameter of 3.0 m, and 
which generates a 1.4 T magnetic field parallel to the 
beam axis. The magnetic field is used to measure the 
charged particle momentum transverse to the beamline. 
The momentum resolution is <j{pt)/pt « 0.1%-pT for 
tracks within \r]\ <1.0 and degrades with increasing \ri\. 

Front electromagnetic lead-scintillator sampling 
calorimeters [13, HI] and rear hadronic iron-scintillator 
sampling calorimeters |39( surround the solenoid and 
measure the energy flow of interacting particles. They 
are segmented into projective towers, each one covering 
a small range in pseudorapidity and azimuth. The 
full array has an angular coverage of \r]\ < 3.6. The 
central region \ri\ < 1.1 is covered by the central 
electromagnetic calorimeter (CEM) and the central and 
end- wall hadronic calorimeters (CHA and WHA). The 
forward region 1.1 < I77I < 3.6 is covered by the end-plug 
electromagnetic calorimeter (PEM) and the end-plug 
hadronic calorimeter (PHA). Energy deposits in the 
electromagnetic calorimeters are used for electron identi- 
fication and energy measurement. The energy resolution 
for an electron with transverse energy Et (measured in 
GeV) is given by a{ET)/ET ~ 13.5%/v^ © 1.5% and 
a{ET)/ET « 16.0%/v^ © 1% for electrons identified 
in the CEM and PEM respectively. Jets are identified 
and measured through the energy they deposit in the 
electromagnetic and hadronic calorimeter towers. The 
calorimeters provide jet energy measurements with 
resolution of approximately a{ET) ~ 0.1-Et + 1.0 GeV 
[36j . The CEM and PEM calorimeters have two di- 
mensional readout strip detectors located at shower 
maximum [stI l40l |. These detectors provide higher 
resolution position measurements of electromagnetic 
showers than are available from the calorimeter tower 
segmentation alone, and also provide local energy mea- 
surements. The shower maximum detectors contribute 
to the identification of electrons and photons, and help 
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FIG. 2: Cutaway isometric view of the CDF II detector. 



separate them from tt decays. 



Beyond the calorimeters resides the muon system, 
which provides muon detection in the range I77I < 1.5. 
For the analyses presented in this article, muons are 
detected in four separate subdetectors. Muons with 
PT > 1.4 GeV/c penetrating the five absorption lengths 
of the calorimeter are detected in the four layers of pla- 
nar multi-wire drift chambers of the central muon detec- 
tor (CMU) dH. Behind an additional 60 cm of steel, 
a second set of four layers of drift chambers, the cen- 
tral muon upgrade (CMP) [l^H^, detects muons with 
PT > 2.2 GeV/c. The CMU and CMP cover the same 
part of the central region < 0.6. The central muon 
extension (CMX) [l^H^I extends the pseudorapidity cov- 
erage of the muon system from 0.6 to 1.0 and thus com- 
pletes the coverage over the full fiducial region of the 
COT. Muons with 1.0 < \r]\ < 1.5 are detected by the 
barrel muon chambers (BMU) [isj . 



The Tevatron collider luminosit y is determined with 
multi-cell gas Cherenkov detectors [44| located in the re- 
gion 3.7 < I77I < 4.7 which measure the average number 
of inelastic pp collisions per bunch crossing. The total 
uncertainty on the luminosity is ±6.0%, of which 4.4% 
comes from the acceptance and the operation of the lu- 
minosity monitor and 4.0% comes from the uncertainty 
of the inelastic pp cross section [45| . 



(a) (b) 




FIG. 3: Feynman diagrams showing the final states of the 
dominant s-channel (a) and t-channel (b) processes, with lep- 
tonic W boson decays. Both final states contain a charged 
lepton, a neutrino, and two jets, at least one of which origi- 
nates from a b quark. 



III. SELECTION OF CANDIDATE EVENTS 

Single top quark events (see Fig.|3]) have jets, a charged 
lepton, and a neutrino in the final state. The top quark 
decays into a W boson and a b quark before hadronizing. 
The quarks recoiling from the top quark, and the b quark 
from top quark decay, hadronize to form jets, motivating 
our event selection which requires two or three energetic 
jets (the third can come from a radiated gluon), at least 
one of which is 5-tagged, and the decay products of a 
W boson. In order to reduce background from multi- 
jet production via the strong interaction, we focus our 
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event selection on the decays of the W boson to eue or 
/Lti/^ in these analyses. Such events have one charged 
lepton (an electron or a muon), missing transverse energy 
resulting from the undetected neutrino, and at least two 
jets. These events constitute the VF+jets sample. We 
also include the acceptance for signal and background 
events in which W Ti^r, and the MJ analysis also is 
sensitive to W boson decays to r leptons. 

Since the pp collision rate at the Tevatron exceeds the 
rate at which events can be written to tape by five orders 
of magnitude, CDF has an elaborate trigger system with 
three levels. The first level uses special-purpose hard- 
ware [i^ to reduce the event rate from the effective beam- 
crossing frequency of 1.7 MHz to approximately 15 kHz, 
the maximum rate at which the detector can be read out. 
The second level consists of a mixture of dedicated hard- 
ware and fast software algorithms and takes advantage 
of the full information read out of the detector W?\ . At 
this level the trigger rate is reduced further to less than 
800 Hz. At the third level, a computer farm running fast 
versions of the offline event reconstruction algorithms re- 
fines the trigger selections based on quantities that are 
nearly the same as those used in offline analyses In 
particular, detector calibrations are applied before the 
trigger requirements are imposed. The third level trigger 
selects events for permanent storage at a rate of up to 
200 Hz. 

Many different trigger criteria are evaluated at each 
level, and events passing specific criteria at one level are 
considered by a subset of trigger algorithms at the next 
level. A cascading set of trigger requirements is known 
as a trigger path. This analysis uses the trigger paths 
which select events with high-px electron or muon can- 
didates. The acceptance of these triggers for tau lep- 
tons is included in our rate estimates but the triggers are 
not optimized for identifying tau leptons. An additional 
trigger path, which requires significant plus at least 
two high-pT jets, is also used to add M^-|-jets candidate 
events with non-triggered leptons, which include charged 
leptons outside the fiducial volumes of the electron and 
muon detectors, as well as tau leptons. 

The third-level central electron trigger requires a COT 
track with pT> 9 GeV/c matched to an energy cluster in 
the CEM with Et> 18 GeV. The shower profile of this 
cluster as measured by the shower-maximum detector is 
required to be consistent with those measured using test- 
beam electrons. Electron candidates with \ri\ > 1.1 are 
required to deposit more than 20 GeV in a cluster in the 
PEM, and the ratio of hadronic energy to electromagnetic 
energy £'pha/£'PEM for this cluster is required to be less 
than 0.075. The third-level muon trigger requires a COT 
track with pT>18 GeV/c matched to a track segment in 
the muon chambers. The ^x+j^ts trigger path requires 
frj. > 35 GeV and two jets with Et> 10 GeV. 

After offiine reconstruction, we impose further require- 
ments on the electron candidates in order to improve 
the purity of the sample. A reconstructed track with 
Pt> 9 GeV/c must match to a cluster in the CEM with 



Et> 20 GeV. Furthermore, we require -E'had /-E'em ^ 
0.055 + 0.00045 x E/GeV and the ratio of the energy of 
the cluster to the momentum of the track E/p has to be 
smaller than 2.0 c for track momenta < 50 GeV/c. For 
electron candidates with tracks with p > 50 GeV/c, no 
requirement on E/p is made as the misidentification rate 
is small. Candidate objects which fail these requirements 
are more likely to be hadrons or jets than those that pass. 

Electron candidates in the forward direction (PHX) are 
defined by a cluster in the PEM with Et > 20 GeV and 
-E-HAD / -E-EM < 0.05. The cluster position and the primary 
vertex position are combined to form a search trajectory 
in the silicon tracker and seed the pattern recognition of 
the tracking algorithm. 

Electron candidates in the CEM and PHX are rejected 
if an additional high-px track is found which forms a 
common vertex with the track of the electron candidate 
and has the opposite sign of the curvature. These events 
are likely to stem from the conversion of a photon. Fig- 
ure UJa) shows the (77, (j>) distributions of CEM and PHX 
electron candidates. 

Muon candidates are identified by requiring the pres- 
ence of a COT track with px> 20 GeV/c that extrap- 
olates to a track segment in one or more muon cham- 
bers. The muon trigger may be satisfied by two types 
of muon candidates, called CMUP and CMX. A CMUP 
muon candidate is one in which track segments matched 
to the COT track are found in both the CMU and the 
CMP chambers. A CMX muon is one in which the track 
segment is found in the CMX muon detector. In order 
to minimize background contamination, further require- 
ments are imposed. The energy deposition in the electro- 
magnetic and hadronic calorimeters has to correspond to 
that expected from a minimum-ionizing particle. To re- 
ject cosmic-ray muons and muons from in-flight decays of 
long-lived particles such as Kg, K^, and A particles, the 
distance of closest approach of the track to the beam line 
in the transverse plane is required to be less than 0.2 cm 
if there are no silicon hits on the muon candidate's track, 
and less than 0.02 cm if there are silicon hits. The re- 
maining cosmic rays are reduced to a negligible level by 
taking advantage of their characteristic track timing and 
topology. 

In order to add acceptance for events containing muons 
that cannot be triggered on directly, several additional 
muon types are taken from the extended muon cover- 
age (EMC) of the ^x+jets trigger path: a track seg- 
ment only in the CMU and a COT track not pointing to 
CMP (CMU), a track segment only in the CMP and COT 
track not pointing to CMU (CMP), a track segment in 
the BMU (BMU), an isolated track not fiducial to any 
muon chambers (CMIO), an isolated track matched to a 
muon segment that is not considered fiducial to a muon 
detector (SCMIO), and a track segment only in the CMX 
but in a region that can not be used in the trigger due to 
tracking limitations of the trigger (CMXNT). Figure HJb) 
shows the (ji, (j)) distributions of muon candidates in each 
of these categories. 
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FIG. 4: Distributions in {(p — r]) space of the electron (a) and muon (b) selection categories, showing the coverage of the detector 
that each lepton category provides. The muon categories are more complicated due to the geometrical limitations of the several 
different muon detectors of CDF. 



We require exactly one isolated charged lepton candi- 
date with |?7| < 1.6. A candidate is considered isolated 
if the i?T not assigned to the lepton inside a cone de- 
fined by i? = ^ (A77)2 + (A(/))2 < 0.4 centered around 
the lepton is less than 10 % of the lepton Et {pt) for 
electrons (muons). This lepton is called a tight lepton. 
Loose charged lepton candidates pass all of the lepton se- 
lection criteria except for the isolation requirement. We 
reject events which have an additional tight or loose lep- 
ton candidate in order to reduce the Z/7*-|-jets and di- 
boson background rates. 

Jets are reconstructed using a cone algorithm by sum- 
ming the transverse calorimeter energy Et in a cone of 
radius R < 0.4. The energy deposition of an identified 
electron candidate, if present, is not included in the jet 
energy sum. The Et of a cluster is calculated with re- 
spect to the z coordinate of the primary vertex of the 
event. The energy of each jet is corrected [i^ for the 
T] dependence and the nonlinearity of the calorimeter re- 
sponse. Routine calibrations of the calorimeter response 
are performed and these calibrations are included in the 
jet energy corrections. The jet energies are also adjusted 
by subtracting the extra deposition of energy from addi- 
tional inelastic pp collisions on the same bunch crossing 
as the triggered event. 

Reconstructed jets in events with identified charged 
lepton candidates must have corrected E^ > 20 GeV and 
detector \r]\ < 2.8. Detector 77 is defined as the pseudora- 
pidity of the jet calculated with respect to the center of 



the detector. Only events with exactly two or three jets 
are accepted. At least one of the jets must be tagged as 
containing a B hadron by requiring a displaced secondary 
vertex within the jet, using the SECVTX algorithm [3 1| . 
Secondary vertices are accepted if the transverse decay 
length significance {ALxy/<Jxy) is greater than or equal 
to 7.5. 

Events passing the ^-j+jets trigger path and the EMC 
muon segment requirements described above are also re- 
quired to have two sufficiently separated jets: Ai?jj > 1. 
Furthermore, one of the jets must be central, with jT^jotl < 
0.9, and both jets are required to have transverse ener- 
gies above 25 GeV. These offiine selection requirements 
ensure full efficiency of the -^-j-l-jets trigger path. 

The vector missing Et (-^t) is defined by 

ix = -Y.E].h,, (1) 

i 

i = calorimeter tower number with \ri\ < 3.6, (2) 

where fii is a unit vector perpendicular to the beam axis 
and pointing at the i^^ calorimeter tower. We also define 

$T = I $'T I • Since this calculation is based on calorimeter 
towers, ^T is adjusted for the effect of the jet corrections 
for all jets. 

A correction is applied to (t foi' muons since they tra- 
verse the calorimeters without showering. The transverse 
momenta of all identified muons are added to the mea- 
sured transverse energy sum and the average ionization 
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energy is removed from the measured calorimeter energy 
deposits. We require the corrected ^-t to be greater than 
25 GeV in order to purify a sample containing leptonic 
W boson decays. 

A portion of the background consists of niultijet events 
which do not contain W bosons. We call these "non-VF" 
events below. We select against the non-W^ background 
by applying additional selection requirements which are 
based on the assumption that these events do not have a 
large from an escaping neutrino, but rather the 
that is observed comes from lost or mismeasured jets. In 
events lacking a W boson, one would expect small values 
of the transverse mass, defined as 



2 (py 



Pv 



(3) 



Because the in events that do not contain W bosons 
often comes from jets which are erroneously identified as 

charged leptons, often points close to the lepton can- 
didate's direction, giving the event a low transverse mass. 
Thus, the transverse mass is required to be above 10 GeV 
for muons and 20 GeV for electrons, which have more of 
these events. 



Further removal of non-W^ events is performed with a 
variable called significance (^T.sig)> defined as 



(4) 



where Cjes is the jet energy correction factor [i^ , E^J^^ 

is a jet's energy before corrections are applied, -Ex. unci 
refers to the vector sum of the transverse components of 
calorimeter energy deposits not included in any recon- 
structed jets, and i?T,unci is the sum of the magni- 
tudes of these unclustered energies. The angle between 

the projections in the r0 plane of a jet and ^x is de- 
noted A(^j^^ ^, and the angle between the projec- 
tions in the r<j> plane of ^ i?T,unci and ^x is denoted 
A0- 3, . When the energies in Equation |4] are mea- 

sured in GeV, -^x,sig is an approximate significance, as 
the dispersion in the measured -^x in events with no true 
-^x is approximated by the denominator. Central elec- 
tron events are required to have -^x,sig > 3.5 — 0.05Mx 

and ^x.sig > 2.5 — 3.125A(/)jot2,*T ' where jet 2 is the jet 
with the second- largest , and all energies are measured 
in GeV. Plug electron events must have -^x,sig > 2 and 
l^x > 45 - 30A(j)-^^j^ for aU jets in the event. These 
requirements reduce the amount of contamination from 
non-M^ events substantially, as shown in the plots in 
Fig. El 

To remove events containing Z bosons, we reject events 
in which the trigger lepton candidate can be paired with 
an oppositely-signed track such that the invariant mass 
of the pair is within the range 76 GeV/c^ < m^^track < 
106 GeV/c^. Additionally, if the trigger lepton candi- 
date is identified as an electron, the event is rejected if a 
cluster is found in the electromagnetic calorimeter that, 
when paired with the trigger lepton candidate, forms an 
invariant mass in the same range. 



IV. SIGNAL MODEL 



In order to perform a search for a previously unde- 
tected signal such as single top quark production, ac- 
curate models predicting the characteristics of expected 
data are needed for both the signal being tested and 
the SM background processes. This analysis uses Monte 
Carlo programs to generate simulated events for each sig- 
nal and background process, except for non-M^ QCD mul- 
tijet events for which events in data control samples are 
used. 



A. s-channel Single Top Quark Model 

The matrix element generator MADEVENT [s^I is used 
to produce simulated events for the signal samples. The 
generator is interfaced to the CTEQ5L [5l| parameteri- 
zation of the parton distribution functions (PDFs). The 
PYTHIA [5^ [sJl program is used to perform the parton 
shower and hadronization. Although madevent uses 
only a leading-order matrix element calculation, stud- 
ies pJi ^] indicate that the kinematic distributions of 
s-channel events are only negligibly affected by NLO cor- 
rections. The parton shower simulates the higher-order 
effects of gluon radiation and the splitting of gluons into 
quarks, and the Monte Carlo samples include contribu- 
tions from initial-state sea quarks via the proton PDFs. 
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CEM W+jets MC Sample CEM Observed Data CEM Difference 




FIG. 5: Plots of -^T.sig vs. for W^+jets Monte Carlo, the selected data in the I + $'^+2 jets sample, and the two 

distributions subtracted for all CEM candidates. The black lines indicate the requirements which are applied. Events with 
lower ^T,sig or ^t' s-re not selected. 




FIG. 6: The two different t-channel processes considered in 
our signal model: (a) the 2 — >■ 2 process and (b) the 2 — >■ 3 
process. 



B. ^^channel Single Top Quark Model 

The t- channel process is more complicated. Several au- 
thors point out [lO, ISSl - ISTl that the leading-order contri- 
bution to t-channel single top quark production as mod- 
eled in parton-shower Monte Carlo programs does not 
adequately represent the expected distributions of ob- 
servable jets, which are better predicted by NLO calcu- 
lations. 

The leading-order process is a 2 — 2 process with a 
h quark in the initial state: b -\- u — !■ d + t, as shown 
in Fig. [SJa). For antitop quark production, the charge 
conjugate processes are implied. A parton distribution 
function for the initial state b quark is used for the cal- 
culation. Since flavor is conserved in the strong inter- 
action, a b quark must be present in the event as well. 
In what follows, this b quark is called the spectator b 
quark. Leading-order parton shower programs create the 



spectator b quark through backward evolution following 
the DGLAP scheme [58l - [6(j |. Only the low-px portion of 
the transverse momentum distribution of the spectator b 
quark is modeled well, while the high-px tail is not esti- 
mated adequately [l3|- In addition, the pseudorapidity 
distribution of the spectator b quark, as simulated by the 
leading-order process, is biased towards higher pseudora- 
pidities than predicted by NLO theoretical calculations. 

We improve the modeling of the t-channel single top 
quark process by using two samples: one for the lead- 
ing 2^2 process b + q q' + t, and a second one 
for the 2 — > 3 process in which an initial-state gluon 
sphts into bb, g + q ~^ q' + t + b. In the second process 
the spectator b quark is produced directly in the hard 
scattering described by the matrix element (Fig. [61(b)). 
This sample describes the most important NLO contri- 
bution to f-channel production and is therefore suitable 
to describe the high-px tail of the spectator b quark px 
distribution. This sample, however, does not adequately 
describe the low-px portion of the spectrum of the specta- 
tor 6 quark. In order to construct a Monte Carlo sample 
which closely follows NLO predictions, the 2 — )■ 2 process 
and the 2 — > 3 process must be combined. 

A joint event sample was created by matching the pT 
spectrum of the spectator b quark to the differential cross 
section predicted by the ztop program [l^l which oper- 
ates at NLO. The matched i-channel sample consists of 
2 — >■ 2 events for spectator b quark transverse momenta 
below a cutoff, called i^x, and of 2 — ?> 3 events for trans- 
verse momenta above K^. The rates of 2 — > 2 and 2 — > 3 
Monte Carlo events are adjusted to ensure the continuity 
of the spectator b quark px spectrum at Kt. The value 
of Kt is adjusted until the prediction of the fraction of t- 
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channel signal events with a detectable spectator b quark 
jet - with > 20GeV/c and \rj\ < 2.8 - matches the 
prediction by ztop. We obtain Kt — 20GeV/c. AU de- 
tectable spectator b quarks with px > 20GeV/c of the 
joint i-channel sample are simulated using the 2 — >■ 3 
sample. 

Figure [7] illustrates the matching procedure and com- 
pares the outcome with the differential px and Qfi] cross 
sections of the spectator b quark, where Qi is the charge 
of the lepton from W boson decay. Both the falling 
spectrum of the spectator b quark and the slightly asym- 
metric shape of the Qe ■ V distribution are well modeled 
by the matched madevent sample. Figure [ ^a)| shows 
the pt distribution of the spectator b quark on a logarith- 
mic scale. The combined sample of i-channel events has 
a much harder px spectrum of spectator b quarks than 
the 2^2 sample alone provides. The tail of the distri- 
bution extends beyond lOOGeV/c, while the 2 — 2 sam- 
ple predicts very few spectator b quarks with px above 
50GeV/c. 



duction predictions. The matching procedure for the t- 
channel sample takes the main NLO effects into account. 
The remaining difference is covered by a systematic un- 
certainty of ±1% or ±2% on the acceptance for s- and 
i-channel events, respectively. 

Recently, an even higher-order calculation of the t- 
channel production cross section and kinematic distri- 
butions has been performed [56, 57} , treating the 2-^3 
process itself at NLO. The production cross section in 
this calculation remains unchanged, but a larger fraction 
of events have a high-px spectator 6 within the detector 
acceptance. This calculation became available after the 
analyses described in this paper were completed. The 
net effect is to slightly decrease the predicted t-channel 
signal rate in the dominant sample with two jets and one 
b tag, and to significantly raise the comparatively low 
signal prediction in the double-tagged samples and the 
three-jet samples, compensating each other. Thus, the 
expected as well as the observed change of the outcome 
is insignificant for the combined and the separate extrac- 
tion of the signal cross section and significance. 



C. Validation 

It is important to evaluate quantitatively the model- 
ing of single top quark events. We compare the kinematic 
distributions of the primary partons obtained from the s- 
channel and the matched i-channel madevent samples 
to theoretical differential cross sections calculated with 
ZTOP [l0|. We find, in general, very good agreement. 
For the i-channel process in particular, the pseudorapid- 
ity distributions of the spectator b quark in the two pre- 
dictions are nearly identical, even though that variable 
was not used to match the two t-channel samples. 

One can quantify the remaining differences between 
the Monte Carlo simulation and the theoretical calcula- 
tion by assigning weights to simulated events. The weight 
is derived from a comparison of six kinematic distribu- 
tions: the Px and the 77 of the top quark and of the 
two highest-i?x jets which do not originate from the top- 
quark decay. In case of t-channel production, we distin- 
guish between 6-quark jets and light-quark jets. The cor- 
relation between the different variables, parameterized by 
the covariance matrix, is determined from the simulated 
events generated by madevent. We apply the single 
top quark event selection to the Monte Carlo events and 
add the weights. This provides an estimate of the de- 
viation of the acceptance in the simulation compared to 
the NLO prediction. In the W + 2 jets sample we find 
a fractional discrepancy of (—1.8 ± 0.9)% (MC stat.) for 
the t-channel, implying that the Monte Carlo estimate 
of the acceptance is a little higher than the NLO pre- 
diction. In the s-channel we find excellent agreement: 
-0.3% ± 0.7% (MC Stat.). More details on the i-channel 
matching procedure and the comparison to ZTOP can be 
found in references [sH and [B^I- The general conclu- 
sion from our studies is that the madevent Monte Carlo 
events represent faithfully the NLO single top quark pro- 



D. Expected Signal Yields 

The number of expected events is given by 

I> = O- • Eovt • -Cint (5) 

where a is the theoretically predicted cross section of the 
respective process, Scvt is the event detection efficiency, 
and £int is the integrated luminosity. The predicted cross 
sections for i-channel and s-channel single top quark pro- 
duction are quoted in section |T1 The integrated lumi- 
nosity used for the analyses presented in this article is 
£int = 3.2fb-i. 

The event detection efficiency is estimated by perform- 
ing the event selection on the samples of simulated events. 
Control samples in the data are used to calibrate the 
efficiencies of the trigger, the lepton identification, and 
the 6-tagging. These calibrations are then applied to the 
Monte Carlo samples we use. 

We do not use a simulation of the trigger efficiency in 
the Monte Carlo samples; instead we calibrate the trig- 
ger efficiency using data collected with alternate trigger 
paths and also Z t^t~ events in which one lepton trig- 
gers the event and the other lepton is used to calculate 
the fraction of the time it, too, triggers the event. We 
use these data samples to calculate the efficiency of the 
trigger for charged leptons as a function of the lepton's 
E'Y and r\. The uncorrected Monte Carlo-based efficiency 
prediction, emc is reduced by the trigger efficiency etrig- 
The efficiency of the selection requirements imposed to 
identify charged leptons is estimated with data samples 
with high-px triggered leptons. We seek in these events 
oppositely-signed tracks forming the Z mass with the 
triggered lepton. The fraction of these tracks passing 
the lepton selection requirements gives the lepton identi- 
fication efficiency. The Z vetoes in the single top quark 
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FIG. 7: Matching of i-channel single top quark events of the 2 — >■ 2 and the 2 — >■ 3 process. The px distributions of the spectator 
b quark are shown, (a) on a logarithmic pt scale, and (b) on a linear pT scale. The ratio of 2 — ^ 2 to 2 — >■ 3 events is adjusted 
such that the rate of spectator b quarks with pt > 20GeV/c and |r;| < 2.8 matches the theoretical prediction. The fraction of 
these events is illustrated in (b) by the shaded area. The matched madevent sample reproduces both the rate and the shape 
of the differential ztop pt Jcj and Qe ■ r/ (d) cross section distributions of the spectator b quark. 



candidate selection requirements enforce the orthogonal- 
ity of our signal samples and these control samples we 
use to estimate the trigger and identification efficiencies. 

A similar strategy is adopted for using the data to cal- 
ibrate the 6-tag efficiency. At LEP, for example, single- 
and double-6-tagged events were used [11] to extract the 
6-tag efficiency and the 5-quark fraction in Z decay. Jet 
formation in pp collisions involves many more processes, 
however, and the precise rates are poorly predicted. A 
jet originating from a b quark produced in a hard scat- 
tering process, for example, may recoil against another 
b jet, or it may recoil against a gluon jet. The invariant 
mass requirement used in the lepton identification pro- 
cedure to purify a sample of Z decays is not useful for 
separating a sample of Z — >■ 66 decays because of the low 
signal-to-background ratio |64ji • 

We surmount these challenges and calibrate the 6- 
tag efficiency in the data using the method described in 
Ref. [m , and which is briefly summarized here. We select 
dijet events in which one jet is tagged with the SECVTX al- 
gorithm, and the other jet has an identified electron can- 
didate with a large transverse momentum with respect 



to the jet axis in it, to take advantage of the character- 
istic semileptonic decays of B hadrons. The purity of 66 
events in this sample is nearly unity. We determine the 
flavor fractions in the jets containing electron candidates 
by fltting the distribution of the invariant mass of the 
reconstructed displaced vertices to templates for 6 jets, 
charm jets, and light-flavor jets, in order to account for 
the presence of non-6 contamination. 

The fraction of jets with electrons in them passing the 
SECVTX tag is used to calibrate the SECVTX tagging effi- 
ciency of 6 jets which contain electrons. This efficiency is 
compared with that of 6 jets passing the same selection 
requirements in the Monte Carlo, and the ratio of the 
efficiencies is applied to the Monte Carlo efficiency for 
all 6 jets. Systematic uncertainites to cover differences in 
Monte Carlo mismodeling of semileptonic and inclusive 
B hadron jets are assessed. The 6-tagging efficiency is ap- 
proximately 45% per 6 jet from top quark decay, for 6 jets 
with at least two tracks and which have jryj < 1. The ra- 
tio between the data-derived efficiency and the Monte 
Carlo prediction does not show a noticeable dependence 
on the I77I of the jet or the jet's Et- 



The differences in the lepton identification efficiency 
and the ^-tagging between the data and the simulation 
are accounted for by a correction factor Ecorr on the single 
top quark event detection efficiency. Separate correction 
factors are applied to the single 6-tagged events and the 
double 6-tagged events. Systematic uncertainties are as- 
sessed on the signal acceptance due to the uncertainties 
on these correction factors. 

The samples of simulated events are produced such 
that the W boson emerging from top quark decay is 
only allowed to decay into leptons, that is ez/g, fJ-i^fi, and 
Ti^r- Tau lepton decay is simulated with tauola [65| . 
The value of emc, the fraction of all signal MC events 
passing our event selection requirements, is multiplied 
by the branching fraction of W bosons into leptons, 
ebr = 0.324. The selection efficiencies for events in which 
the W boson decays to electrons and niuons are similar, 
but the selection efficiency for W — tj/t- decays is less, 
because many tau decays do not contain leptons, and also 
because the px spectrum of tau decay products is softer 
than those of electrons and muons. In total, the event 
detection efficiency is given by 

Eovt = EMC ■ £BR • Ecorr ' ^trig (6) 

Including all trigger and identification efficiencies we find 
£cvt(^-channel) = (1.2 ±0.1)% and Ccvt (s-channel) = 
(1.8 ± 0.1)%. The predicted signal yields for the selected 
two- and three-jet events with one and two (or more) 
6-tagged jets are listed in Tables U and HH 



V. BACKGROUND MODEL 

The final state of a single top quark event - a charged 
lepton, missing transverse energy from the undetected 
neutrino, and two or three jets with one or more B 
hadrons, is also the final state of the Wbb process, which 
has a much larger cross section. Other processes which 
produce similar final states, such as Wcc and ti, also 
mimic the single top quark signature because of misre- 
construction or because of the loss of one or more compo- 
nents of the expected final state. A detailed understand- 
ing of the rates and of the kinematic properties of the 
background processes is necessary in order to accurately 
measure the single top quark production cross section. 

The largest background process is the associated pro- 
duction of a leptonically decaying W boson and two or 
more jets. Representative Feynman diagrams are shown 
in Fig. [51 The cross section for VF-|-jets production is 
much larger than that of the single top quark signal, and 
the iy-|-jets production cross sections are difficult to cal- 
culate theoretically. Furthermore, Il^-l-jets events can be 
kinematically quite similar to the signal events we seek, 
and in the case that the jets contain b quarks, the final 
state can be identical to that of single top quark produc- 
tion. The narrow top quark width, the lack of resonant 
structure in W^-|-jets events, and color suppression make 
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(a) (b) (c) 




FIG. 8: Some representative diagrams of VF-|-jets production. 
The production cross sections of these processes are much 
larger than that of single top quark production. 




FIG. 9: Feynman diagrams of the tt background to single top 
quark production. To pass the event selection, these events 
must have one charged lepton (a), or one or two hadronic jets 
(b), that go undetected. 



the quantum-mechanical interference between the signal 
and the background very small. 

Top quark pair production, in which one or two jets, or 
one charged lepton, has been lost, also constitutes an im- 
portant background process (Fig. El) . There are also con- 
tributions from the diboson production processes WW, 
WZ, and ZZ, which are shown in Fig.[TUl Z'/7*-|-jets pro- 
cesses in which one charged lepton from Z boson decay is 
missed, (Fig. ITlT a')). and QCD multijet events, which do 
not contain W bosons but instead have a fake lepton and 
mismeasured (Fig. [TIJb)). The rates and kinematic 
properties of these processes must be carefully modeled 
and validated with data in order to make a precise mea- 
surement of single top quark production. 

Because there are many different background pro- 
cesses, we use a variety of methods to predict the back- 
ground rates. Some are purely based on Monte Carlo 
simulations scaled to high-order predictions of the cross 
section (such as tt); some are purely data-based (non- 
W); and some require a combination of Monte Carlo and 
data (W+jets). 



A. Monte Carlo Based Background Processes 

We use samples of simulated Monte Carlo events to 
estimate the contributions of tt, diboson, and Z/j*+]ets 
production to the 6-tagged lepton-|-jets sample. The cor- 
responding event detection efficiencies govt are calculated 
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(a) (b) (c) 




FIG. 10: Feynman diagrams for diboson production, which 
provides a small background for single top quark production. 




FIG. 11: Representative Feynman diagrams for (a) Z/7*+jets 
production and (b) non-VF events, in which a jet has to be 
misidentified as a lepton and fjrj, must be mismeasured to 
pass the event selection. 



in the same way as the single top quark processes de- 
scribed in Section |IV] and Equation [51 We apply Equa- 
tion [5] to calculate the final number of expected events. 
Therefore, it is essential that the given physical process 
is theoretically well understood, i.e., the kinematics are 
well described in simulated events and the cross section 
is well known. 

To model the tt production contribution to our selected 
samples, we use pythia [54] Monte Carlo samples, scaled 
to the NLO theoretical cross section prediction [6^ |63] 
of ati = (6.70 ± 0.83) pb, assuming m* = 175 GeV/c^. 
The systematic uncertainty contains a component which 
covers the differences between the calculation chosen and 
others [l^ . The event selection efficiencies and the 
kinematic distributions of tt events are predicted using 
these PYTHIA samples. Because the Monte Carlo effi- 
ciencies for lepton identification and b tagging differ from 
those observed in the data, the tt efficiencies estimated 
from the Monte Carlo are adjusted by factors ecorr, which 
are functions of the numbers of leptonically decaying W 
bosons and 6-tagged jets. 

To estimate the expected number of diboson events 
in our selected data sample we use the theoretical cross 
section predicted for a center of mass energy of -y/s = 
2.00 TeV using the mcfm program j6^ and extrap- 
olate the values to y/s = 1.96 TeV. This leads to 
aww = (13.30 ± 0.80) pb, awz = (3.96 ± 0.34) pb, 
and azz — (1.57 ± 0.21) pb. The cross section uncer- 
tainties reported in [gl] are smaller than those obtained 
with MCFM Version 5.4; we quote here the larger uncer- 
tainties. The event selection efficiencies and the kine- 
matic distributions of diboson events are estimated with 



PYTHIA Monte Carlo samples, with corrections applied 
to bring the lepton identification and 6-tagging efficiency 
in line with those estimated from data samples. 

Events with Z/j* boson production in association with 
jets are simulated using ALPGEN J7^], with pythia used 
to model the parton shower and hadronization. The 
Z/7*-l-jets cross section is normalized to that measured 
by CDF in the Z/j*{-^ e+e~)-|-jets sample [Tlj], within 
the kinematic range of the measurement, separately for 
the different numbers of jets. Lepton universality is as- 
sumed in Z decay. 



B. Non- W Multijet Events 

Estimating the non-W multijet contribution to the 
sample is challenging because of the difficulty of simu- 
lating these events. A variety of QCD processes produce 
copious amounts of multijet events, but only a tiny frac- 
tion of these events pass our selection requirements. In 
order for an event lacking a leptonic W boson decay to 
be selected, it must have a fake lepton or a real lepton 
from a heavy flavor quark decay. In the same event, the 
must be mismeasured. The rate at which fake lep- 
tons are reconstructed and the amount of mismeasured 
are difficult to model reliably in Monte Carlo. 

The non-VF background is modeled by selecting data 
samples which have less stringent selection requirements 
than the signal sample. These samples, which are de- 
scribed below, are dominated by non-W^ events with sim- 
ilar kinematic distributions as the non-M^ contribution to 
the signal sample. The normalization of the non-M^ pre- 
diction is separately determined by fitting templates of 
the distribution to the data sample. 

We use three different data samples to model the non- 
W multijet contributions. One sample is based on the 
principle that non- 11^ events must have a jet which passes 
all lepton identification requirements. A data sample of 
inclusive jets is subjected to all of our event selection re- 
quirements except the lepton identification requirements. 
In lieu of an identified lepton, a jet is required with 
Et > 20 GeV. This jet must contain at least four tracks 
in order to reduce contamination from real electrons from 
W and Z boson decay, and 80-95% of the jet's to- 
tal calorimetric energy must be in the electromagnetic 
calorimeter, in order to simulate a misidentified electron. 
The ^-tagging requirement on other jets in the event is 
relaxed to requiring a taggable jet instead of a tagged 
jet in order to increase the size of the selected sample. 
A taggable jet is one that is within the acceptance of 
the silicon tracking detector and which has at least two 
tracks in it. This sample is called the jet-based sample. 

The second sample takes advantage of the fact that 
fake leptons from non-W^ events have difficulty passing 
the lepton selection requirements. We look at lepton can- 
didates in the central electron trigger that fail at least two 
of five identification requirements that do not depend on 
the kinematic properties of the event, such as the frac- 
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tion of energy in the hadronic calorimeter. These objects 
are treated as leptons and all other selection requirements 
are applied. This sample has the advantage of having the 
same kinematic properties as the central electron sample. 
This sample is called the ID-based sample. 

The two samples described above are designed to 
model events with misidentified electron candidates. Be- 
cause of the similarities in the kinematic properties of the 
ID-based and the jet-based events, we use the union of 
the jet-based and ID-based samples as our non-M^ model 
for triggered central electrons (the CEM sample). Re- 
markably, the same samples also simulate the kinematics 
of events with misidentified triggered muon candidates; 
we use the samples again to model those events (the 
CMUP and CMX samples). The jet-based sample alone 
is used to model the non-H^ background in the PHX sam- 
ple because the angular coverage is greater. 

The kinematic distributions of the reconstructed ob- 
jects in the EMC sample are different from those in the 
CEM, PHX, CMUP, and CMX samples due to the trigger 
requirements, and thus a separate sample must be used 
to model the non-T4^ background in the EMC data. This 
third sample consists of events that are collected with the 
^T+j6ts trigger path and which have a muon candidate 
passing all selection requirements except for the isolation 
requirement. It is called the non-isolated sample. 

The non-M^ background must be determined not only 
for the data sample passing the event selection require- 
ments, but also for the control samples which are used 
to determine the W^-l-jets backgrounds, as described in 
Sections IV CI and IV Dl The expected numbers oi non-W 
events are estimated in pretag events - events in which 
all selection criteria are applied except the secondary ver- 
tex tag requirement. We require that at least one jet in 
a pretagged event is taggable. In order to estimate the 
non-W^ rates in this sample, we also remove the event 
selection requirement, but we retain all other non-W re- 
jection requirements. We fit templates of the $t distri- 
butions of the M^-l-jets and the non-H^ samples to the 
spectra of the pretag data, holding constant the normal- 
izations of the additional templates needed to model the 
small diboson, ti, Z-|-jets, and single top backgrounds. 
The fractions of non-VF events are then calculated in the 
sample with > 25 GeV. The inclusion or omission 
of the single top contribution to these fits has a negligi- 
ble impact on the non-VF fractions that are fit. These fits 
are performed separately for each lepton category (CEM, 
PHX, CMUP, CMX, and EMC) because the instrumen- 
tal fake lepton fractions are different for electrons and 
muons, and for the different detector components. In all 
lepton categories except PHX, the full .^x spectrum is 
used in the fit. For the PHX electron sample, we require 
.^-p > 15 GeV in order to minimize sensitivity to the 
trigger. The fits in the pretag region are also used to es- 
timate the VF-|-jets contribution in the pretag region, as 
described in Section IV CI As Fig. [12] shows, the resulting 
fits describe the data quite well. 

Estimates of the non-VF yields in the tagged sam- 



ples used to search for the single top signal are also 
needed. These samples are more difficult because the 
non-W modeling samples are too small to apply tagging 
directly - only a few events pass the secondary vertex re- 
quirement. However, since the data show no dependence 
of the fe-tagging rate on we use the untagged non- 
Pi^ templates in the fits to the distributions in the 
tagged samples. These fits are used to extract the non- 
W fractions in the signal samples. As before, the Monte 
Carlo predictions of diboson, ti, Z-|-jets, and single top 
production are held constant and only the normalizations 
of the W^-|-jets and the non-H^ templates are allowed to 
fioat. The resulting shapes are shown in Fig. [13] for the 
single-tagged sample, and these are used to derive the 
non-VF fractions in the signal samples. As before, the 
inclusion or omission of the single top contributions in 
the fits has a negligible effect on the fitted non-VF frac- 
tions. Because of the uncertainties in the tagging rates, 
the template shapes, and the estimation methods, the es- 
timated non-PF rates are given systematic uncertainties 
of ±40% in single-tagged events and ±80% in double- 
tagged events. These uncertainties cover the differences 
in the results obtained by fitting variables other than 
well as by changing the histogram binning, vary- 
ing the fit range, and using alternative samples to model 
the non-W^ background. The uncertainty in the double- 
tagged non-VF prediction is larger because of the larger 
statistical uncertainty arising from the smaller size of the 
double-tagged sample. 



C. W-|-Heavy Flavor Contributions 

Events with a W boson accompanied by heavy fla- 
vor production constitute the majority of the 6-tagged 
lepton+jets sample. These processes are Wbb, shown 
in Fig. [HKa), Wcc, which is the same process as Wbb, 
but with charm quarks replacing the b quarks, and Wcj, 
which is shown in Fig. [SJb). Each process may be ac- 
companied by more jets and pass the event selection re- 
quirements for the W±3 jets signal sample. Jets may 
fail to be detected, or they may fail to pass our selection 
requirements, and such events may fall into the W-\-l jet 
control sample. While these events can be simulated us- 
ing the ALPGEN generator, the theory uncertainties on 
the cross sections of these processes remain large com- 
pared with the size of the single top quark signal [72| - 
l79j . It is because of these large a priori uncertainties 
on the background predictions and the small signal-to- 
background ratios in the selected data samples that we 
must use advanced analysis techniques to purify further 
the signal. We also use the data itself, both in control 
samples and in situ in the samples passing all selection 
requirements, to constrain the background rates, reduc- 
ing their systematic uncertainties. The in situ fits are 
described in Section [TX] and the control sample fits are 
described below. 

The control samples used to estimate the W+ heavy 
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FIG. 12: Fits to distributions in the pretag samples for the five different lepton categories (GEM, PHX, GMUP, GMX, 
EMG) in W+two jet events. The fi-actions of non-W events are estimated from the portions of the templates above the $^ 
thresholds shown by the arrows. Overflows are collected in the highest bin of each histogram. The data are indicated with 
points with error bars, and the shaded histograms show the best-fit predictions. The non-VK templates are not shown stacked, 
but the VK+jets and "Others" templates are stacked. The unshaded histogram is the sum of the fitted shapes. 
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FIG. 13: Fits to distributions in the single-tagged sample for the five different lepton categories (CEM, PHX, GMUP, 
GMX, EMG) in W+2 jet events. The fraction of non-W^ events is estimated from the fraction of the template above the frj, 
threshold shown by the arrows. Overflows are collected in the highest bin of each histogram. The data are indicated with 
points with error bars, and the shaded histograms show the best-fit predictions. The non-M^ template is not shown stacked, 
but the VK-|-jets and "Others" templates are stacked. The unshaded histogram is the sum of the fitted shapes. 
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flavor predictions and uncertainties are the pretagged 
W + n jets samples and the tagged + 1 jet sample. We 
use the ALPGEN+pythia Monte Carlo model to extrap- 
olate the measurements in the control samples to make 
predictions of the W^+heavy flavor background contri- 
butions in the data samples passing our signal selection 
requirements. The pretagged W+n jets samples are used 
to scale the ALPGEN predictions, and the tagged -f 1 
jet sample is used to check and adjust ALPGEn's predic- 
tions of the fractions of PF-|-jets events which are Wbb, 
Wcc, and Wcj events. A full description of the method 
follows. 

The number of pretag VF-l-jets events is estimated by 
assuming that events not included in the predictions 
based on Monte Carlo (these are the tt and diboson pre- 
dictions - the single top quark signal is a negligible com- 
ponent of the pretag sample) or non-VF multijet events, 
are W^-|-jets events. That is: 

ixrpretag _ ,rprctag /-, jpretag x Arpretag /-\ 
^^W'-l-jets — ^^data ^ Jnon-W) ~ ^^MC 

where N^^'^^^^ is the number of observed events in the 
pretag sample, f^l'^_^ is the fraction of non-H^ events 
in the pretag sample, as determined from the flts de- 
scribed in Section IVBI and N^^^^ is the expected num- 
ber of pretag tt and diboson events. Alpgen typically 
underestimates the inclusive VF-|-jets rates by a factor of 
roughly 1.4 dS]. To estimate the yields of Wbb, Wcc, 
and Wcj events, we multiply this data-driven estimate 
of the W-l-jets yield by heavy flavor fractions. 

The heavy flavor fractions in VF-|-jets events are also 
not well predicted by our ALPGEN-|-pythia model. In 
order to improve the modeling of these fractions, we per- 
form flts to templates of flavor-separating variables in the 
^-tagged W+1 jet data sample, which contains a vanish- 
ingly small component of single top quark signal events 
and is not otherwise used in the flnal signal extraction 
procedure. This sample is quite large and is almost en- 
tirely composed of VF-|-jets events. We include Monte 
Carlo models of the small contributions from tt and di- 
boson events as separate templates, normalized to their 
SM expected rates, in the fits to the data. Care must be 
exercised in the estimation of the H^-|-heavy flavor frac- 
tions, because fltting in the W+1 jet sample and using 
the fit values for the W+2 jet and W+3 jet samples is 
an extrapolation. We seek to estimate the b and charm 
fractions in these events with as many independent meth- 
ods as possible and we assign generous uncertainties that 
cover the differences between the several estimations of 
the rates. 

We fit the distribution of the jet-flavor separator 6nn 
described in Section [VII Template distributions are cre- 
ated based on ALPGEN4-pythia Monte Carlo samples for 
the W-I-LF, Wcc, Wcj, Wbb, and Z/7*-|-jets processes, 
where IF-I-LF events are those in which none of the jets 
accompanying the leptonically decaying W boson con- 
tains a 6 or c quark. The template distributions for these 
five processes are shown in Fig.lTWa). The and diboson 



templates are created using PYTHIA Monte Carlo sam- 
ples. The non-VF model described in Section fVBI is also 
used. The VK+LF template's rate is constrained by the 
data-derived mistag estimate, described in Section IV Dl 
within its uncertainty; the other M^-|-jets templates' rates 
are not constrained. The tt, diboson, Z/7*-|-jets, and 
non-VF contributions are constrained within their uncer- 
tainties. The Wbb and Wcc components float in the fit 
but are scaled with the same scaling factor, as the same 
diagrams, with b and c quarks interchanged, contribute 
in the ALPGEN model, and we expect a similar corre- 
spondence for the leading processes in the data. We also 
let the Wcj fraction float in the fit. The best fit in the 
W+1 jet sample is shown in Fig. [T^Jb). 

The fit indicates that the ALPGEN-predicted Wbb + 
Wcc fraction must be multiplied by 1.4 ± 0.4 in order for 
the templates to match the data, and the best-fit value 
of the Wcj fraction is also 1.4 ± 0.4 larger than that 
predicted by alpgen. In addition to the fit to the 6nn 
distribution, we also fit the W+heayy flavor fractions in 
the 5-tagged W^-|-l-jet sample with another variable, the 
reconstructed invariant mass of the secondary vertex. We 
perform this alternate fit in our standard 6-tagged sample 
as well as in one with loosened 6-tag requirements. 

We obtain additional information from (8T| . in which 
a direct measurement of the Wc fraction is made using 
lepton charge correlations. The central value of this mea- 
surement agrees well with the Monte Carlo predictions. 
We thus set the multiplicative factor of the Wc compo- 
nent to 1.0 ± 0.3 for use in the two- and three-jet bins. 

The 30% uncertainties assessed on the Wbb + Wcc and 
Wcj yields cover the differences in the measured fit values 
and also approximates our uncertainty in extrapolating 
this fraction to W+2 and 3 jet events. We check these 
extrapolations in the W+2 and 3 jet events as shown 
in Figs. [MFc) and fMFd) : no additional fit is performed 
for this comparison. The rates and flavor compositions 
match very well with the observed data in these sam- 
ples. The uncertainties in the fit fractions arising from 
the uncertainties on the shapes of the ^nn templates dis- 
cussed in Section IVII are a negligible component of the 
total uncertainty. 

Since the yields of I^-l-heavy flavor events are esti- 
mated from 6-tagged data using the same SECVTX algo- 
rithm as is used for the candidate event selection, the un- 
certainty in the 6-tagging efficiency does not factor into 
the prediction of these rates. 



D. Rates of Events with Mistagged Jets 

Some iy-|-LF events pass our event selection require- 
ments due to the presence of mistagged jets. A mistagged 
jet is one which does not contain a weakly-decaying B 
or charm hadron but nonetheless passes all of the sec- 
ondary vertex tagging requirements of the SECVTX algo- 
rithm [3li |. Jets are mistagged for several reasons: track- 
ing errors such as hit misassignment or resolution effects 
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FIG. 14: Templates (a) of the jet flavor separator 6nn for VF+light, VF+charm (adding the Wcc and Wcj contributions 
because of their similar shapes), and VK+bottom events. The template labeled "Other" represents the diboson and Z/7*+jets 
contributions. The strong discrimination bNN provides to separate jet flavors makes it a powerful variable in multivariate 
analyses. Panel (b) shows the outcome of the flt to the VP'+l jet data sample allowing the b, c, and light-flavor components to 
float as described in Section |V] Panels (c) and (d) compare the data and the corresponding predictions in the W+2 jet and 
W^-|-3 jet samples. In panels (b) through (d), the data are indicated with points with error bars, and the model predictions are 
shown with shaded histograms, stacked in the same order as the legend. 



cause the reconstruction of false secondary vertices, the 
multi-prong decays of long-lived particles like the if ^ and 
the A° supply real secondary vertices, and nuclear in- 
teractions with the detector material also provide a real 
source of non-6/c secondary vertices. 

The estimation of the background yields from tracking 
resolution related mistags is accomplished without the 
use of detector simulation. The procedure is to measure 
the fractions of jets which have negative decay lengths 
(defined below) to estimate the fraction of light-flavor 
jets which have incorrect positive decay lengths. This 
fraction is adjusted in order to account for the asymme- 
try between the negative decay length distribution and 
the positive decay length distribution, and to account for 
the heavy-flavor contribution in the jet data, to obtain 
the mistag probability. This probability is multiplied by 
an estimate of VF-I-LF jet yield in each of our samples, 



separately for each lepton category and jet-number cat- 
egory. Each of these steps is described in detail below. 

Events passing inclusive jet triggers with vertices with 
negative two-dimensional (2D) decay lengths comprise 
the control sample used to estimate the mistag rate. The 
2D decay length L^y is the magnitude of the displacement 
from the primary vertex to the reconstructed secondary 
vertex, projected first onto the plane perpendicular to the 
beam axis, and then projected again onto the jet axis's 
projection in the plane perpendicular to the beam axis. 
The sign is given by the sign of the dot product of the 2D 
decay length and the jet momentum. Tracking resolution 
effects are expected to produce a symmetric distribution 
of the 2D decay length of light-flavor misreconstructed 
secondary vertices, centered on zero. A jet is said to be 
"negatively tagged" if the transverse decay length signif- 
icance Lxy/cL^y < —7.5, while Lxy/uL^y > 7.5 defines a 
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"positively tagged" jet. 

The per-jet mistag rate is not a single number but 
rather it is parameterized as a function of six kinematic 
variables: the Et and rj of the jet, the number of tracks 
in the jet, the scalar sum of transverse energy of the tight 
jets, the number of reconstructed primary vertices, and 
the z coordinate of the primary vertex associated with 
the jet. Since the negative tag rate does not fully reflect 
the positive mistags due to the decays of long-lived par- 
ticles and interactions with the detector material, a cor- 
rection factor a/3 for the mistag asymmetry is applied. 
The factor a corrects for the asymmetry between the 
positive and negative tag rates of light-flavor jets, and 
the factor /3 corrects for the presence of b jets in the jet 
samples used to derive the mistag rate. These correc- 
tion factors are extracted from fits to distributions of the 
invariant mass of the reconstructed secondary vertex in 
tagged jets in an inclusive jet sample. A systematic un- 
certainty is derived from fits to templates of pseudo-cr, 
which is defined as Lxy-^ [HI, where m is the invariant 
mass of the tracks in the displaced vertex, and is the 
magnitude of the vector sum of the transverse momenta 
of the tracks in the displaced vertex. The systematic 
uncertainty on the asymmetry factor a/3 is the largest 
component of the uncertainty on the mistag estimate. 
Another component is estimated from the differences in 
the negative tag rates computed with different jet data 
samples with varying trigger requirements. The average 
rate for jets to be mistagged is approximately 1%, al- 
though it depends strongly on the jet i?T- 

The per-jet mistag probabilities are multiplied by data- 
driven estimates of the M^-|-LF yields, although we must 
subtract the yields of the other components. We subtract 
the pretagged PF-|-heavy flavor contributions from the 
pretagged M^-l-jets yield of Equation [7] to estimate the 
W+l.Y yield: 
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^prctag 
Wbb 



ixrprctag 



Arprotag 



(8) 



The pretagged VF-|-heavy flavor contributions are esti- 
mated by dividing the tagged VF-|-heavy flavor contri- 
butions by the 6-tagging efficiencies for each event cate- 
gory. The mistag parameterization is applied to each of 
the Monte Carlo and data samples used in Equations [7] 
and[51 in order for the total mistag yield prediction not to 
be biased by differences in the kinematics of the several 
T4^-|-jets flavor categories. 

We use ALPGEN-I-PYTHIA Monte Carlo samples to pre- 
dict the kinematics of M^-|-LF events for use in the anal- 
yses of this paper. The mistag rate parameterization de- 
scribed above is applied to each jet in VF-I-LF MC events, 
and these rates are used to weight the events to predict 
the yield of mistagged events in each bin of each his- 
togram of each variable. 

The predicted numbers of background events, signal 
events, and the overall expected normalizations are given 
in Tables HI for events with exactly one b tag, and in Ta- 
ble [n] for events with two or three b tags. Only two 
selected events in the data have three b tags, consistent 
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FIG. 15: The number of events predicted and observed for 
W-|-jets events in which at least one jet is b-tagged. The data 
are indicated with points, and the shaded histograms show 
the signal and background predictions which are stacked to 
form the total prediction. The stacking order is the same as 
the legend. The systematic uncertainty on the rates is far 
too large to use a simple counting experiment to measure the 
single top quark cross section. 



with the expectation assuming that the third tag is a 
mistag. The observed event counts and predicted yields 
are summarized graphically as functions of jet multiplic- 
ity in Fig. [13 



E. Validation of Monte Carlo Simulation 

Because multivariate analyses depend so heavily on 
properly simulating events, it is very important to val- 
idate the modeling of the distributions in Monte Carlo 
by checking them with the data. We do this by com- 
paring hundreds of data and Monte Carlo distributions. 
We make comparisons in control samples in which no jets 
have been 5-tagged to test the VF-I-LF shapes, we test the 
modeling of W-\-l jet events to examine -I- heavy flavor 
fraction and shapes, we compare the data and Monte 
Carlo distributions of kinematic variables in the signal 
regions of tagged 2- and 3-jet events to check the model- 
ing of all of these variables, and we verify the modeling 
of the correlations between the discriminating variables. 

A sample of the validation plots we examine is shown 
in Figures [THl [l71 and [THl The close match of the distri- 
butions gives confidence in the results. The validations 
of the modeling of other observable quantities are shown 
later in this paper. 

Out of the hundreds of distributions checked for 
discrepancies, only two distributions in the untagged 
VF-l-jets data were found to be poorly simulated by our 
Monte Carlo model: the pseudorapidity of the lowest- 
energy jet in both ly -f- 2 jet and W+i jet events and the 
distance between the two jets in (/) — 77 space in -I- 2 jet 
events. These discrepancies are used to estimate system- 
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TABLE I: Summary of the predicted numbers of signal and background events with exactly one 6 tag, with systematic uncer- 
tainties on the cross section and Monte Carlo efficiencies included. The total numbers of observed events passing the event 
selections are also shown. The W + 2 jets and W + 3 jets samples are used to test for the signal, while the W + 1 jets and 
+ 4 jets samples are used to check the background modeling. 





W + 1 jet 




W + 2 jets 


W + 3 jets 


W + 4: jets 


Wbb 


823.7 ± 249. 


6 


581.1 ± 175.1 


173.9 ±52.5 


44.8 ± 13.7 


Wcc 


454.7 ± 141. 


7 


288.5 ±89.0 


95.7 ±29.4 


27.2 ±8.5 


Wcj 


709.6 ±221. 


1 


247.3 ± 76.2 


50.8 ± 15.6 


10.2 ±3.2 


Mistags 


1147.8 ± 166 


.0 


499.1 ± 69.1 


150.3 ±21.0 


39.3 ±6.2 




62.9 ±25.2 








7 R -1- Q n 

1 .U 31 O.U 


tt production 


17.9 ± 2.6 




167.6 ± 24.0 


377.3 ± 54.8 


387.4 ± 54.8 


Diboson 


29.0 ±3.0 




83.3 ±8.5 


28.1 ± 2.9 


7.1 ±0.7 


Z/7*+jets 


38.6 ±6.3 




34.8 ±5.3 


14.6 ±2.2 


4.0 ±0.6 


Total Background 


3284.1 ±633 


.8 


1989.9 ± 349.6 


926.0 ± 113.4 


527.7 ±60.3 


s-channel 


10.7 ± 1.6 




45.3 ±6.4 


14.7 ± 2.1 


3.3 ±0.5 


f-channel 


24.9 ±3.7 




85.3 ± 12.6 


22.7 ± 3.3 


4.4 ±0.6 


Total Prediction 


3319.7 ±633.8 


2120.4 ± 350.1 


963.4 ± 113.5 


535.4 ±60.3 


Observation 


3516 




2090 


920 


567 



TABLE 11: Summary of predicted numbers of signal and background events with two or more b tags, with systematic uncer- 
tainties on the cross section and Monte Carlo efficiencies included. The total numbers of observed events passing the event 
selections are also shown. The W + 2 jets and + 3 jets samples are used to test for the signal, while the W + 4 jets sample 
are used to check the background modeling. 





W + 2 jets 


W + 3 jets 


W + 4 jets 


Wbb 


75.9 ± 23.6 


27.4 ± 8.5 


8.2 ±2.6 


Wcc 


3.7 ± 1.2 


2.4 ±0.8 


1.1 ±0.4 


Wcj 


3.2 ± 1.0 


1.3 ±0.4 


0.4 ±0.1 


Mistags 


2.2 ±0.6 


1.6 ±0.4 


0.7 ±0.2 


Non-W 


2.3 ±0.9 


0.2 ±0.1 


2.4 ± 1.0 


tt production 


36.4 ±6.0 


104.7 ± 17.3 


136.0 ±22.4 


Diboson 


5.0 ±0.6 


2.0 ±0.3 


0.6 ±0.1 


^/7*+jets 


1.7 ±0.3 


1.0 ±0.2 


0.3 ±0.1 


Total Background 


130.4 ± 26.8 


140.6 ± 19.7 


149.8 ± 22.5 


s-channel 


12.8 ±2.1 


4.5 ±0.7 


1.0 ±0.2 


i-channel 


2.4 ±0.4 


3.5 ±0.6 


1.1 ± 0.2 


Total Prediction 


145.6 ±26.9 


148.6 ± 19.7 


151.9 ± 22.5 


Observation 


139 


166 


154 



atic uncertainties on the shapes of our final discriminant VI. JET FLAVOR SEPARATOR 

variables. These distributions and the discussion of as- 
sociated systematic uncertainties are presented in Sec- 
tion [VIlTl In our event selection, we identify 6-quark jets by re- 
quiring a reconstructed secondary vertex. A large frac- 
tion, 48% of the expected background events with b- 
tagged jets have no B hadrons in them at all. This is 
due to the long lifetime and the mass of charm hadrons, 
the false reconstruction of secondary vertices in light jets, 
and the fact that the fraction of pretagged VF+jets events 
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FIG. 16: Validation plots comparing data and Monte Carlo for basic kinematic quantities for events passing the event selection 
requirements with two jets and at least one b tag. The data are indicated with points, and the shaded histograms show the 
signal and background predictions which are stacked to form the total prediction. The stacking order follows that of the legend. 
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FIG. 17: Validation plots comparing data and Monte Carlo for basic kinematic quantities for events passing the event selection 
requirements with three identified jets and at least one b tag. The data are indicated with points, and the shaded histograms 
show the signal and background predictions which are stacked to form the total prediction. The stacking order follows that of 
the legend. 
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FIG. 18: Validation plots comparing data and Monte Carlo for missing transverse energy for events passing our event selection 
requirements with two jets (a) and three jets (b), both with at least one b tag. The data are indicated with points, and the 
shaded histograms show the signal and background predictions which are stacked to form the total prediction. The stacking 
order follows that of the legend. 



containing B hadrons is small compared with the charm 
and light-flavored components. Tagged jets without B 
hadrons in them can be separated from those containing 
B hadrons by extending the vertex requirement using re- 
constructed quantities that differentiate the two classes 
of jets. These quantities take advantage of the long life- 
time (t w 1.6 ps) and the large mass (m 5 GeV/c^) of 
B hadrons. 

The invariant mass of the tracks in the reconstructed 
vertex is larger on average for vertices arising from a B 
hadron decay than it is in vertices in jets that do not con- 
tain B hadrons. The number of tracks in the secondary 
vertex is also on average larger, and the significance of 
the transverse decay length (ALxy/o-xy) is larger for B 
hadron vertices. 

In addition to the vertex properties, attributes of the 
tracks in the jet are suitable to discriminate jets contain- 
ing a B hadron. Tracks of charged particles originating 
from the decay of a i? hadron have larger impact param- 
eters and higher transverse momenta relative to the jet 
axis. The presence of semileptonic B hadron decays in- 
creases the number and transverse momenta relative to 
the jet axis of electrons and muons in b jets as compared 
to non-6 jets. 

To make full use of all discriminating quantities and 
their correlations, the variables are used as inputs to a 
neural network which is applied to jets selected by the 
SECVTX secondary vertex tagger [82|. This network is 
trained with simulated events of single top quark pro- 
duction and the main background processes, mixed ac- 
cording to the background estimation. Processes with 
secondary vertices due to B hadron decays are treated as 
signal events, namely single top quark, ti, and Wbb pro- 
duction. Physical processes containing no 6 quarks but 
charm and light flavors are treated as background: Wcc, 
Wcj, and W + light jets. 



The NeuroBayes package [S^ used for the neural- 
network jet flavor separator combines a three-layer feed 
forward neural network with a complex robust prepro- 
cessing. Transforming the input variables to be dis- 
tributed as unit-width Gaussians reduces the influence of 
long tails; diagonalization and rotation transform the co- 
variance matrix of the variables into a unit matrix. The 
neural network uses Bayesian regularization techniques 
for the training process. The network infrastructure con- 
sists of one input node for each input variable plus one 
bias node, ten hidden nodes, and one output node which 
gives a continuous output variable 6nn in the interval 
[— 1, 1]. Jets with secondary vertices induced by the de- 
cay of a -B hadron tend to have 6nn values close to 1, 
while jets with falsely reconstructed vertices tend to have 
6nn values near —1. 

The significances of the training variables are deter- 
mined automatically during the preprocessing in Neu- 
roBayes. The correlation matrix of all preprocessed in- 
put variables is calculated, including the correlation of all 
variables to the target variable, which is +1 for jets with 
B hadron decays and —1 for all other jets. The variables 
are omitted one at a time to determine the loss of total 
correlation to the target caused by their removal. The 
variable with the smallest loss of correlation is discarded 
leading to an (n— l)-diniensional correlation matrix. The 
same procedure is repeated with the reduced correlation 
matrix to find the least important of the (n—l) remaining 
variables. The significance of each variable is calculated 
by dividing the loss of correlation induced by its removal 
by the square root of the sample size. We investigated 50 
candidate input variables but chose to include as inputs 
only those with a significance larger than 3.0, of which 
there are 25. 

Because the neural-network jet flavor separator is 
trained using simulated events, it is essential to verify 
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that the input and output distributions are modeled well, 
and to assess systematic uncertainties where discrepan- 
cies are seen. The shapes of the input variable distri- 
butions in the data are found to be reasonably well re- 
produced by the simulation. We also examine the dis- 
tribution of 6nn for both b signal and non-6 background. 
The b signal distribution is checked with double-SECVTX- 
tagged dijet events and compared against Monte Carlo 
jets with B hadron decays. One jet in addition is re- 
quired to have an electron with a large transverse mo- 
mentum with respect to the jet axis, in order to purify 
further the b content of the sample. The jet opposite to 
the electron-tagged jet is probed for its distribution of 
the neural network output. The distribution of 6nn in 
these jets is well simulated by that of b jets in the Monte 
Carlo Q. 

To test the response of the network to light-flavored 
jets, negative-tagged jets were tested in data and Monte 
Carlo. A correction function was derived [s^] to adjust 
for the small discrepancy observed in the output shape. 
This correction function is parameterized in the sum of 
transverse energies in the event, the number of tracks per 
jet, and the transverse energy of the jet. The correction 
function is applied to light-flavored and charm Monte 
Carlo jets in the analyses presented in this paper, but 
not to b jets. The uncorrected neural network outputs are 
used to evaluate systematic uncertainties on the shapes 
of the final discriminant distributions. 

The resulting network output 6nn distinguishes the b 
signal from the charm and light-flavored background pro- 
cesses with a purity that increases with increasing 6nn, 
as can be seen in Fig. I14f a). Furthermore, the network 
gives very similar shapes for different 6-quark-producing 
processes, indicating that it is sensitive to the properties 
of &-quark jets and does not depend on the underlying 
processes that produce them. 

Not only is 6nn a valuable tool for separating the sin- 
gle top quark signal from background processes that do 
not contain b jets, it is also valuable for separating the 
different flavors of VF-|-jets events, which is crucial in 
estimating the background composition. As described 
in Section |Vl the distribution of ^nn is fit in 6-tagged 
W+1 jet events, and the heavy-flavor fractions for b and 
charm jets are extracted. Using also a direct measure- 
ment of the Wc rate [8l| , predictions are made of the b 
and charm jet fractions in the two- and three-jet bins. 
These predictions are used to scale the ALPGEN Monte 
Carlo samples, which are then compared with the data 
in the two- and three-jet 6-tagged samples, without refit- 
ting the heavy-flavor composition, as shown in Fig. I14r c) 
and (d). The three-jet sample has a larger sample of tt 
events which are enriched in b jets. The successful mod- 
eling of the changing flavor composition as a function of 
the number of identified jets provides confidence in the 
correctness of the background simulation. 

All multivariate methods described here use 6nn as 
an input variable, and thus we need 6nn values for all 
Monte Carlo and data events used to model the final dis- 



tributions. For the mistagged W^-|-LF shape prediction, 
we use the iy-|-LF Monte Carlo sample, where the events 
are weighted by the data-based mistag prediction for each 
taggable jet. This procedure improves the modeling over 
what would be obtained if Monte Carlo mistags were 
used, as the mistag probabilities are based on the data, 
and it increases the sample size we use for the mistag 
modeling. An issue that arises is that parameterized 
mistagged events do not have &nn values and random 
values must be chosen for them from the distribution in 
light-flavor events. If a VF-I-LF event has more than one 
taggable jet, then random values are assigned to both 
jets. These events are used for both the single-mistag 
prediction and the double-mistag prediction with appro- 
priate weights. The randomly chosen flavor-separator 
values must be the same event-by-event and jet-by-jet for 
each of the four analyses in this paper in order for the 
super discriminant combination method to be consistent. 

The distributions of 6nn for non-W^ multijet events are 
more difficult to predict because the flavor composition 
of the jets in these events is poorly known. Specifically, 
since a non-W event must have a fake lepton (or a lepton 
from heavy- fiavor decay), and also mismeasured ^j^, the 
flavor composition of events passing the selection require- 
ments depends on the details of the detector response, 
particularly in the tails of distributions which are diffi- 
cult to model. It is necessary therefore to constrain these 
flavor fractions with CDF data, and the flavor fractions 
thus estimated are specific to this analysis. The non-iy 
event yields are constrained by the data as explained in 
Section IVB] 

The fraction of each fiavor: b, charm, and light-fiavored 
jets (originating from light quarks or gluons), is estimated 
by applying the jet fiavor separator to 6-tagged jets in the 
15 < < 25 GeV sideband of the data. In this sample, 
we find a flavor composition of 45% b quark jets, 40% c 
quark jets, and 15% light-flavored jets. Each event in the 
non-M^ modeling samples (see Section fV Bp is randomly 
assigned a flavor according to the fraction given above 
and then assigned a jet flavor separator value chosen at 
random from the appropriate flavor distribution. The 
fractions of the non-W events in the signal sample are 
uncertain both due to the uncertainties in the sideband 
flt and the extrapolation to the signal sample. We take as 
an alternative flavor composition estimate 60% b quark 
jets, 30% c quark jets, and 10% light-flavored jets, which 
is the most 6-like possibility of the errors on the flavor 
measurement. This alternative flavor composition affects 
the shapes of the final discriminant distribution through 
the different flavor-separator neural network values. 



VII. MULTIVARIATE ANALYSIS 

The search for single top quark production and the 
measurement of its cross section present substantial ex- 
perimental challenges. Compared with the search for ti 
production, the search for single top quarks suffers from a 
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lower SM production rate and a larger background. Sin- 
gle top quark events are also kinematically more similar 
to W+jets events than ti events are, since there is only 
one heavy top quark and thus only one W boson in the 
single top quark events, while there are two top quarks, 
each decaying to Wb, in tt events. The most serious chal- 
lenge arises from the systematic uncertainty on the back- 
ground prediction, which is approximately three times 
the size of the expected signal. Simply counting events 
which pass our selection requirements will not yield a 
precise measurement of the single top quark cross sec- 
tion no matter how much data are accumulated because 
the systematic uncertainty on the background is so large. 
In fact, in order to have sufficient sensitivity to expect 
to observe a signal at the 5 a level, the systematic uncer- 
tainty on the background must be less than one-fifth of 
the expected signal rate. 

Further separation of the signal from the background is 
required. Events that are classified as being more signal- 
like are used to test for the presence of single top quark 
production and measure the cross section, and events 
that are classified as being more background-like improve 
our knowledge of the rates of background processes. In 
order to optimize our sensitivity, we construct discrimi- 
nant functions based on kinematic and &-tag properties 
of the events, and we classify the events on a continuous 
spectrum that runs from very signal-like for high values 
of the discriminants to very background-like for low val- 
ues of the discriminants. We fit the distributions of these 
discriminants to the background and signal+background 
predictions, allowing uncertain parameters, listed in Sec- 
tion [VllIl to float, in a manner described in Section HXl 

To separate signal events from background events, we 
look for properties of the events that differ between signal 
and background. Events from single top quark produc- 
tion have distinctive energy and angular properties. The 
backgrounds, too, have distinctive features which can be 
exploited to help separate them. Many of the variables 
we compute for each selected candidate event are moti- 
vated by a specific interpretation of the event as a signal 
event or a background event. It is not necessary that 
all variables used in a discriminant are motivated by the 
same interpretation of an event, nor do we rely on the cor- 
rectness of the motivation for the interpretation of any 
given event. Indeed, each analysis is made more optimal 
when it includes a mixture of variables that are based on 
different ways to interpret the measured particles in the 
events. We optimize our analyses by using variables for 
which the distributions are maximally different between 
signal events and background events, and for which we 
have reliable modeling as verified by the data. 

We list below some of the most sensitive variables, 
and explain why they are sensitive in terms of the differ- 
ences between the signal and background processes that 
they exploit. The three multivariate discriminants, like- 
lihood functions, neural networks, and boosted decision 
trees, use these variables, or variations of them, as in- 
puts; the analyses also use other variables. The matrix 



element analysis uses all of these features implicitly, and 
it uses &NN explicitly. Normalized Monte Carlo predic- 
tions ("templates") and modeling comparisons of these 
variables are shown in Figs. [T51andB(Il 

• Me^h'. the invariant mass of the charged lepton, 
the neutrino, and the b jet from the top quark 
decay. The Pz of the neutrino, which cannot be 
measured, is inferred by constraining M^^ to the 
W boson mass, using the measured charged lepton 
candidate's momentum and setting =^t- The 
neutrino's Pz is the solution of a quadratic equa- 
tion, which may have two real solutions, one real 
solution, or two complex solutions. For the case 
with two real solutions, the one with the lower \pz\ 
is chosen. For the complex case, the real part of the 
Pz solution is chosen. Some analyses use variations 
of this variable with different treatment of the un- 
measured \pz\ of the neutrino. The distribution of 
Mg^b peaks near mt for signal events, with broader 
spectra for background events from different pro- 
cesses. 

• Ht- the scalar sum of the transverse energies of 
the jets, the charged lepton, and in event. 
This quantity is much larger for tt events than for 
W^-|-jets events; single top quark events populate 
the region in between VF-|-jets events and tt events 
in this variable. 

• Mjj-. the invariant dijet mass, which is substan- 
tially higher on average for events containing top 
quarks than it is for events with M^-|-jets. 

• Q X rj: the sign of the charge of the lepton times 
the pseudorapidity of the light quark jet |84| . Large 
Q X ?7 is characteristic of t-channel single top quark 
events, because the light quark recoiling from the 
single top quark often retains much of the momen- 
tum component along the z axis it had before ra- 
diating the W boson. It therefore often produces a 
jet which is found at high Multiplying rj by the 
sign of the lepton's charge Q improves the separa- 
tion power of this variable since 2/3 of single top 
quark production in the t-channel is initiated by a 
u quark in the proton or a {u) quark in the antipro- 
ton, and the sign of the lepton's charge determines 
the sign of the top quark's charge and is correlated 
with the sign of the rj of the recoiling light-flavored 
jet. The other 1/3 of single top quark production 
is initiated by down-type quarks and has the op- 
posite charge-77 correlation. iy-|-jets and ti events 
lack this correlation, and also have fewer jets pass- 
ing our E'Y requirement at large \'q\ than the single 
top quark signal. 

• cos 9ij : the cosine of the angle between the charged 
lepton and the light quark jet [23|. For t-channel 
events, this tends to be positive because of the V — 
A angular dependence of the W boson vertex. This 
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variable is most powerful when computed in the 
rest frame of the top quark. 

• 6nn: the jet flavor separator described in Sec- 
tion lVIl This variable is a powerful tool to separate 
the signal from T4^+LF and M^+charm events. 

• M^: the "transverse mass" of the charged lepton 

candidate and the vector. The transverse mass 
is defined to be the invariant mass of the projec- 
tions of the three-momentum components in the 
plane perpendicular to the beam axis, and is so de- 
fined as to be independent of the unmeasured pz 
of the neutrino. Events without W bosons in them 
(but with fake leptons and mismeasured ^x) have 
lower on average than VF-|-jets events, signal 
events, and tt events. Events with two leptonically 
decaying W bosons - some diboson and tt events - 
have even higher average values of . The dis- 
tribution of Mrp^ is an important cross-check of the 
non-VF background rate and shape modeling. 

While there are many distinctive properties of a single 
top quark signal, no single variable is sufficiently sensi- 
tive to extract the signal with the present data sample. 
We must therefore use techniques that combine the dis- 
crimination power of many variables. We use four such 
techniques in the VF-|-jets sample, a multivariate likeli- 
hood function, a matrix element method, an artificial 
neural network, and a boosted decision tree. These are 
described in detail in the following sections. Each of 
these techniques makes use of the most sensitive vari- 
ables described above in different ways, and in combi- 
nation with other variables. The measurements using 
the separate techniques are highly correlated because the 
same events are analyzed with each technique and be- 
cause many of the same features are used, but the dif- 
ferences between the techniques provide more discrim- 
ination power in combination as well as the ability to 
cross-check each result with the others separately. 

The measured single top quark cross section and the 
significance of the result depend on the proper modeling 
of the input variable distributions for the signals and the 
background processes. We examine the distributions of 
all input variables in the selected candidate events, com- 
paring the data to the sum of the background and SM 
signal predictions, and we also compare the distributions 
in a sample of events with no b tags but which pass all 
other event selection requirements. The untagged event 
sample is much larger than the tagged data sample and 
has no overlap with it, providing very precise checks of 
the Monte Carlo's modeling of the data. We do not limit 
the investigation to input variables but also check the 
distributions of other kinematic variables not used in the 
discriminants. We also check the distributions of each 
discriminant output variable in events with no b tags. 
Each of these investigations is done for each technique, 
for 2-jet and 3-jet events separately, and for each category 



of charged lepton candidates, requiring the examination 
of thousands of histograms. 

A. Multivariate Likelihood Function 

A multivariate likelihood function (LF) [s^ is one 
method for combining several sensitive variables. This 
method makes use of the relative probabilities of finding 
an event in histograms of each input variable, compared 
between the signal and the background. 

The likelihood function Lk for event class k is con- 
structed using binned probability density functions for 
each input variable. The probability that an event from 
sample k will populate bin j of input variable i is de- 
fined to be fijk ■ The probabilities are normalized so that 
J2j fijk — 1 for all variables i and all samples k. For the 
signal. A; = 1, and in this paper, four background classes 
are used to construct the likelihood function: Wbb, tt, 
Wcc/Wc, and VF-I-LF, which are event classes fc = 2, 3, 
4, and 5, respectively. Histogram underfiows and over- 
fiows are properly accounted for. The likelihood function 
for an event is computed in two steps. First, for each re- 
constructed variable i, the bin j in which the event falls 
is obtained, and the quantities 

P^k = , (9) 

are computed for each variable i and each event class k. 
The Pik are used to compute 

r — rii^l Pik /,^N 

where rivar is the number of input variables. The sig- 
nal likelihood function, referred to as LF discriminant in 
the following, is the one which corresponds to the signal 
class of events, Ci. This method does not take advan- 
tage of the correlations between input variables, which 
may be different between the signal and the background 
processes. The predicted distributions of the likelihood 
functions are made from fully simulated Monte Carlo 
and data sets where appropriate, with all correlations in 
them, and so while correlations are not taken advantage 
of, they are included in the necessary modeling. The 
reduced dependence on the correlations makes the LF 
analysis an important cross-check on the other analyses, 
which make use of the correlations. More detailed infor- 
mation on this method can be found in [86^ and 87]. 

Three likelihood functions are computed for use in the 
search for single top quark production. The first, Lt, is 
optimized for the t-channel signal; it is used for events 
with two jets and one b tag. Another, Lg, is optimized 
for the s-channel signal; it is applied to events with two 
jets and two b tags. The Lg-based analysis was sepa- 
rately labeled the LFS analysis in [23|. The third, L^j, 
is optimized for the sum of both s- and i-channel single 
top quark production; it is applied to events with three 
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FIG. 19: Monte Carlo templates (left) and validation plots (right) comparing data and Monte Carlo for variables with good 
discriminating power for events passing our selection requirements with two or three identified jets and at least one b tag. The 
data are indicated with points, and the shaded histograms show the signal and background predictions which are stacked to 
form the total prediction. The stacking order follows that of the legend. Overflows are collected in the highest bin of each 
histogram. 



jets. The inputs to these three Ukelihood functions are 
described in Sections IVIIA2I IVIIA31 and IVIIA41 re- 
spectively. 



1. Kinematic Constraints 



The likehhood function input variables include the 
squares of the quantum-mechanical matrix elements, us- 
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FIG. 20: Monte Carlo templates (left) and validation plots (right) comparing data and Monte Carlo for variables with good 
discriminating power for events passing our selection requirements with two identified jets and at least one 6 tag. The data are 
indicated with points, and the shaded histograms show the signal and background predictions which are stacked to form the 
total prediction. The stacking order follows that of the legend. Overflows are collected in the highest bin of each histogram. 



ing MADGRAPH [50|, Computed with the measured four- 
vectors. These calculations depend very strongly on the 
invariant masses of the system and the ivh system, 
vifhich result from the W boson and top quark decay, re- 
spectively. The neutrino leaves no trace in the detector; 
is an approximation to its transverse momentum, and 



is not measured. The b quark is also imperfectly re- 
constructed; a 6-tagged jet's energy is an approximation 
to the b quark's momentum. We solve for the Pz of the 
neutrino and the energy of the h quark while requiring 
that Mil, = and Mi^b = mt. The W boson mass 
constraint results in two solutions. If both are real, the 
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one with the smaher \pz\ is used. If both are complex, 
a minimal amount of additional ^-t is added parallel to 
the jet axis assigned to be the b from the top quark's 
decay until a real solution for can be obtained. In 
rare cases in which this procedure still fails to produce a 
real additional is added along the fe-jct axis to 
minimize the imaginary part of | , and then a minimal 
amount of ^rj, is added perpendicular to the fo-jet axis 
until a real jp^l is obtained. 

The top quark mass constraint can be satisfied by scal- 
ing the &-jet's energy, holding the direction fixed, until 
M^„b = mt. As the 6-jet's energy is scaled, the is ad- 
justed to be consistent with the change. We then recal- 
culate p'^ using the Mw constraint described above, and 
the process is iterated until Me^b = 'nit- The resulting 
four- vectors of the h quark and the neutrino are then used 
with the measured four-vector of the charged lepton in 
the matrix element expressions to construct discriminant 
variables that separate the signal from the background. 



2. 2-Jet t-channel Likelihood Function 

The t-channel likelihood function Ct uses seven vari- 
ables, and assumes the 6-tagged jet comes from top quark 
decay. The variables used are: 

• ffx, the scalar sum of the Et^s of the two jets, the 
lepton £'t, and -^t- 

9 Q X rj, the charge of the lepton times the pseudo- 
rapidity of the jet which is not ^-tagged. 

• Xkin' the of the comparison of the measured b 
jet energy and the one the kinematic constraints re- 
quire in order to make Mi^t — nit and Mii, — Mw, 
using the nominal uncertainty in the b jet's energy. 
Any additional which is added to satisfy the 

= Mw constraint is added to xiin using the 
nominal uncertainty in the -^x measurement. 

• cos 9ij , the cosine of the angle between the charged 
lepton and the untagged jet in the top quark decay 
frame. 

• Mjj, the invariant mass of the two jets. 



• ME 



the differential cross section for the t- 



t-chan' 

channel process, as computed by MADGRAPH using 
the constrained four-vectors of the b, £, and i^. 

• The jet fiavor separator output 6nn described in 
Section ED 



3. 2-Jet s-channel Likelihood Function 

The s-channel likelihood function Ls uses nine vari- 
ables. Because these events have exactly two jets, both 
of which are required to be 6-tagged, we decide which jet 



comes from the top quark decay with a separate likeli- 
hood function that includes the transverse momentum of 
the b quark, the invariant mass of the b quark and the 
charged lepton, and the product of the scattering angle 
of the 6 jet in the initial quarks' rest frame and the lep- 
ton charge. To compute this last variable, the of the 
neutrino has been solved for using the mw constraint. 
The variables input to Lg are: 

• Mjj, the invariant mass of the two jets. 

• Px I the transverse momentum of the two-jet sys- 
tem. 

• Ai?jj, the separation between the two jets in (p-r] 
space. 

• Mi^b, the invariant mass of the charged lepton, the 
neutrino, and the jet assigned to be the b jet from 
the top quark decay. 

• E'x , the transverse energy of the leading jet, that 
is, the jet with the largest Et- 

• rjj^, the pseudorapidity of the non-leading jet. 

• Pxj the transverse momentum of the charged lep- 
ton. 

• Q XT], the charge of the lepton times the pseudora- 
pidity of the jet which is not assigned to have come 
from the top quark decay. 

• The logarithm of the likelihood ratio constructed 
by matrix elements computed by madgraph, us- 
ing the p'^ solution which maximizes the likelihood 
described in the next point. This likelihood ratio 
is defined as 



ME^+ME, 



ME^+MEt + MEwbb ■ 

• The output of a kinematic fitter which chooses a 
solution of p^ that maximizes the likelihood of the 
solution by allowing the values of p^ and Py to vary 
within their uncertainties. This likelihood is mul- 
tiplied by the likelihood used to choose the b jet 
that comes from the top quark, and their product 
is used as a discriminating variable. 



4- 3-Jet Likelihood Function 

Three-jet events have more ambiguity in the assign- 
ment of jets to quarks than two-jet events. A jet must be 
assigned to be the one originating from the b quark from 
top quark decay, and another jet must be assigned to be 
the recoiling jet, which is a light-fiavored quark in the 
t-channel case and a b quark in the s-channel case. In all 
there are six possible assignments of jets to quarks not al- 
lowing for grouping of jets together. The same procedure 
described in Section IVII A H is used on all six possible jet 
assignments. If only one jet is 6-tagged, it is assumed 
to be the b quark from top quark decay. If two jets are 
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6-tagged, the jet with the highest — logx^ + O.OOSpT is 
chosen, where is the smaher of the outputs of the 
kinematic fitter, one for each solution. This algorithm 
correctly assigns the b jet 75% of the time. 

There is still an ambiguity regarding the proper assign- 
ment of the other jets. If exactly one of the remaining 
jets is 6-tagged, it is assumed to be from a b quark, and 
the untagged jet assigned to be the i-channel recoiling 
jet; otherwise, the jet with larger Et is assigned to be 
the t-channel recoiling jet. In all cases, the smaller |p^| 
solution is used. 

The likelihood function L^j is defined with the follow- 
ing input variables: 

• Mi^f), the invariant mass of the charged lepton, the 
neutrino, and the jet assigned to be the b jet from 
from the top quark decay. 

• 6nn: the output of the jet-flavor separator. 

• The number of 6-tagged jets. 

• Q X 77: the charge of the lepton times the pseu- 
dorapidity of the jet assigned to be the t-channcl 
recoiling jet. 

• The smallest AR between any two jets, where AR 
is the distance in the (p-rj plane between a pair of 
jets. 

• The invariant mass of the two jets not assigned to 
have come from top quark decay. 

• cos 9gj : the cosine of the angle between the charged 
lepton and the jet assigned to be the t-channel re- 
coiling jet in the top quark's rest frame. 

• The transverse momentum of the lowest-i?T jet. 

• The pseudorapidity of the reconstructed W boson. 

• The transverse momentum of the b jet from top 
quark decay. 

5. Distributions 

In each data sample, distinguished by the number of 
identified jets and the number of b tags, a likelihood func- 
tion is constructed with the input variables described 
above. The outputs lie between zero and one, where zero 
is background-like and one is signal-like. The predicted 
distributions of the signals and the expected background 
processes are shown in Fig. [21] for the four 6-tag and jet 
categories. The templates, each normalized to unit area, 
are shown separately, indicating the separation power for 
the small signal. The sums of predictions normalized to 
our signal and background models, which are described 
in Sections fVl and HVl respectively, are compared with the 
data. Figure I2l|l^a) | shows the discriminant output distri- 
butions for the data and the predictions summed over all 
four 6-tag and jet categories. 



6. Validation 

The distributions of the input variables to each likeli- 
hood function are checked in the zero-, one-, and two-tag 
samples for two- and three-jet events. Some of the most 
important variables' validation plots are shown in Sec- 
tions IV El and IVIII The good agreement seen between 
the predictions and the observations in both the input 
variables and the output variables gives confidence in the 
validity of the technique. 

Each likelihood function is also tested in the untagged 
sample, although the input variables which depend on 
6-tagging are modified in order to make the test. For 
example, 6nn is fixed to —1 for untagged events, Q x rj 
uses the jet with the largest \ri\ instead of the untagged 
jet, and the taggable jet with the highest Et is used 
as the 6-tagged jet in variables which use the 5-tagged 
jet as an input. The modeling of the modified likeli- 
hood function in the untagged events is not perfect, as 
can be seen in Fig. [^^b)[ This mismodeling is covered 
by the systematic uncertainties on the ALPGEN model- 
ing of iy-|-jets events which constitute the bulk of the 
background. Specifically, using the untagged data as the 
model for mistagged W^-l-jets events as well as shape un- 
certainties on ARjj and rij2 cover the observed discrep- 
ancy. 



7. Background Likelihood Functions 

Another validation of the Monte Carlo modeling and 
the likelihood function discriminant technique is given by 
constructing discriminants that treat each background 
contribution separately as a signal. These discriminants 
then can be used to check the modeling of the rates and 
distributions of the likelihood function outputs for each 
background in turn by purifying samples of the targeted 
backgrounds and separating them from the other compo- 
nents. The same procedure of Equation (TU] is followed, 
except k — 2, 3, 4, or 5, corresponding to the Wbb, ti, 
Wcc/Wc, and the Vl^-|-LF samples, respectively, chang- 
ing only the numerator of Equation [TOl Each of these 
discriminants acts in the same way as the signal discrimi- 
nant, but instead it separates one category of background 
from the other categories and also from the signals. Dis- 
tributions of LH^+bottom, Lth Lw+cha.rm, and Lw+LF are 
shown in Fig.[23]for 6-tagged W+2 jet events passing our 
event selection. The modeling of the rates and shapes of 
these distributions gives us confidence that the individual 
background rates are well predicted and that the input 
variables to the likelihood function are well modeled for 
the main background processes, specifically in the way 
that they are used for the signal discriminant. 
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FIG. 21: Templates of predictions for the signal and background processes, each scaled to unit area (left) and comparisons 
of the data with the sum of the predictions (right) of the likelihood function for each selected data sample. Single top quark 
events are predominantly found on the right-hand sides of the histograms while background events are mostly found on the 
left-hand sides. The two-jet, one-&-tag plots are shown on a logarithmic vertical scale for clarity, while the others are shown 
on a linear scale. The data are indicated by points with error bars, and the predictions are shown stacked, with the stacking 
order following that of the legend. 
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FIG. 22: Comparison of the data with the sum of the predictions of the likehhood function for the sum of all selected data 
samples (left) and for two-jet one-tag events (right) applied to the untagged sideband, the latter with appropriate modifications 
to variables that rely on 6-tagging. The stacking order follows that of the legend. The discrepancies between the prediction and 
the observation in the untagged sideband seen here are covered by systematic uncertainties on the VK-|-jets background model. 
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FIG. 23: Distributions of I/vi^+bottom, itf, ^w+charm, and Lw+hF for 6-tagged W+2 jet events passing our event selection. 
The signal and background contributions are normalized to the same predicted rates that are used in the signal extraction 
histograms. In each plot, the background process which the discriminant treats as signal is stacked on top of the other 
background processes. The stacking orderings follow those of the legends. 



Matrix Element Method 



34 



for a specific process would be 



The matrix element (ME) method rehes on the eval- 
uation of event probabihties for signal and background 
processes based on calculations of the relevant SM differ- 
ential cross sections. These probabilities are calculated 
on an event-by-event basis for the signal and background 
hypotheses and quantify how likely it is for the event 
to have originated from a given signal or background 
process. Rather than combine many complicated vari- 
ables, the matrix element method uses only the mea- 
sured energy-momentum four- vectors of each particle to 
perform its calculation. The mechanics of the method as 
it is used here are described below. Further information 
about this method can be found in 



P ^ — 



where the differential cross-section is given by and 



da = 



(27r)4|X|^ 



4y^(gi ■q2f-ml^ml^ 



d^niqi+q2;pi, ■■,Pn) (n) 



1. Event Probability 

If we could measure the four-vectors of the initial and 
final state particles very precisely, the event probability 



where M is the Lorentz-invariant matrix element for the 
process under consideration; qi , q2 and m^j^ , niq^ are the 
four momenta and masses of the incident particles; and 
d$„ is the 71-body phase space given by : 



d^n{qi + q2;pi, ■■,Pn) ^ \ql+q2- 




(12) 



However, several eff'ects have to be considered: (1) the 
partons in the initial state cannot be measured, (2) neu- 
trinos in the final state are not measured directly, and 
(3) the energy resolution of the detector cannot be ig- 
nored. To address the first point, the differential cross 
section is weighted by parton distribution functions. To 
address the second and third points, we integrate over 
all particle momenta which we do not measure (the mo- 
mentum of the neutrino), or do not measure well, due to 



resolution effects (the jet energies) . The integration gives 
a weighted sum over all possible parton-level variables y 
leading to the observed set of variables x measured with 
the CDF detector. The mapping between the particle 
variables y and the measured variables x is established 
with the transfer function W{y,x), which encodes the 
detector resolution and is described in Section I VII B 21 
Thus, the event probability takes the form 



Pi^) = \ j daiy)dqidq2f i\q'i/pbee.n,\) f i\q2/Phc,,J)Wiy,x) 



(13) 



where da{y) is the differential cross section in terms of 
the particle variables; / (gf /pboam) are the PDFs, which 
are functions of the fraction of the proton momentum 
Pbcam carried by quark i. The initial quark momentum 



is assumed to be in the direction of the beam axis for 
purposes of this calculation. Substituting Equations [TT] 
and [13 into Equation [T3] transforms the event probability 
to 



P(x) = - / 27r^|Xp ^ (EgjE^e^m) f (EgjE^,,^) ^^^^ ^)d'fidE,,dEg, , (14) 
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where we have used the approximation 
\J{.qi ■ 92)^ - ml^ml^ ~ 2Eq^Eq^, neglecting the 
masses and transverse momenta of the initial partons. 

We calculate the squared matrix element \M\'^ for the 
event probability at LO by using the helas (HELic- 
ity Amplitude Subroutines for Feynman Diagram Eval- 
uations) package [11]. The correct subroutine calls for 
a given process are automatically generated by mad- 
GRAPH |50||. We calculate event probabilities for all sig- 
nificant signal and background processes that can be eas- 
ily modeled to first order: s-channel and t-channel single 
top quark production as weU as the Wbb, Wcg, Wgg 
(shown in Fig. [5]) and tt (Fig. ^ processes. The Wcg 
and Wgg processes are only calculated for two-jet events 
because they have very little contribution to three-jet 
background. 

The matrix elements correspond to fixed-order tree- 
level calculations and thus are not perfect representations 
of the probabilities for each process. Since the integrated 
matrix elements are not interpreted as probabilities but 
instead are used to form functions that separate signal 
events from background events, the choice of the matrix 
element calculation affects the sensitivity of the analysis 
but not its accuracy. The fully simulated Monte Carlo 
uses parton showers to approximate higher-order effects 
on kinematic distributions, and systematic uncertainties 
are applied to the Monte Carlo modeling in this analysis 
in the same way as for the other analyses. 

While the matrix-element analysis does not directly 
use input variables that are designed to separate signals 
from backgrounds based on specific kinematic properties 
such as Mi^b, the information carried by these recon- 
structed variables is represented in the matrix element 
probabilities. For Mii,h in particular, the pole in the top 
quark propagator in J\4 provides sensitivity to this recon- 
structed quantity. While the other multivariate analyses 
use the best-fit kinematics corresponding to the measured 
quantities on each event, the matrix element analysis, by 
integrating over the unknown parton momenta, extracts 
more information, also using the measurement uncertain- 
ties. 



2. Transfer Functions 

The transfer function, W{y,x), is the probability of 
measuring the set of observable variables x given specific 
values of the parton variables y. In the case of well- 
measured quantities, W{y,x) is taken as a (5-function 
(i.e. the measured momenta are used in the differential 
cross section calculation). When the detector resolution 
cannot be ignored, W{y,x) is a parameterized resolution 
function based on fully simulated Monte Carlo events. 
For unmeasured quantities, such as the three components 
of the momentum of the neutrino, the transfer function 
is constant. Including a transfer function between the 

neutrino's transverse momentum and would double- 



count the transverse momentum sum constraint. The 
choice of transfer function affects the sensitivity of the 
analysis but not its accuracy, since the same transfer 
function is applied to both the data and the Monte Carlo 
samples. 

The energies of charged leptons are relatively well mea- 
sured with the CDF detector and we assume (5-functions 
for their transfer functions. The angular resolution of 
the calorimeter and the muon chambers is also good and 
we assume (5-functions for the transfer functions of the 
charged lepton and jet directions. The resolution of jet 
energies, however, is broad and it is described by a trans- 
fer function VFjct(£^parton, ^'jot)- 

The jet energy transfer functions map parton energies 
to measured jet energies after correction for instrumental 
detector effects [4§] . This mapping includes effects of ra- 
diation, hadronization, measurement resolution, and en- 
ergy outside the jet cone not included in the reconstruc- 
tion algorithm. The jet transfer functions are obtained by 
parameterizing the jet response in fully simulated Monte 
Carlo events. We parameterize the distribution of the 
difference between the parton and jet energies as a sum 
of two Gaussian functions: one to account for the sharp 
peak and one to account for the asymmetric tail. We 
determine the parameters of the Wjet(£^parton, ^-jot) by 
performing a maximum likelihood fit to jets in events 
passing the selection requirements. The jets are required 
to be aligned within a cone of AR < 0.4 with a quark or 
a gluon coming from the hard scattering process. 

We create three transfer functions: one for 6 jets, which 
is constructed from the b quark from top quark decay in 
s-channel single top quark events; one for light jets, which 
is constructed from the light quark in i-channel single top 
quark events; and one for gluons, which is constructed 
from the radiated gluon in Wcg events. In each process, 
the appropriate transfer function is used for each final- 
state parton. 



3. Integration 

To account for poorly measured variables, the differen- 
tial cross section must be integrated over all variables — 
14 variables for two-jet events, corresponding to the mo- 
mentum vectors of the four final-state particles (12 vari- 
ables) and the longitudinal momenta of the initial state 
partons (2 variables). There are 11 delta functions in- 
side the integrals: four for total energy and momentum 
conservation and seven in the transfer functions (three 
for the charged lepton's momentum vector and four for 
the jet angles). The calculation of the event probability 
therefore involves a three-dimensional integration. The 
integration is performed numerically over the energies of 
the two quarks and the longitudinal momentum of the 
neutrino (p^). For three-jet events, the additional jet 
adds one more dimension to the integral. 

Because it is not possible to tell which parton resulted 
in a given jet, we try all possible parton combinations. 
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using the 5-tagging information when possible. These 
probabihties are then added together to create the final 
event probability. 

Careful consideration must be given to tt events falling 
into the + 2 jet and + 3 jet samples because these 
events have final-state particles that are not observed. In 
two-jet events, these missing particles could be a charged 
lepton and a neutrino (in the case of tt — > £^iye£'~D£ibb 
decays) or two quarks (in the case of tt — )■ i~^i^eqq'bb de- 
cays), and since both of these are decay products of a 
W boson, we treat this matrix element in either case as 
having a final-state W boson that is missed in the detec- 
tor. The particle assignment is not always correct, but 
the purpose of the calculation is to construct variables 
that have maximal separation power between signal and 
background events, and not that they produce a correct 
assignment of particles in each event. The choice of which 
particles are assumed to have been missed is an issue of 
the optimization of the analysis and not of the validity 
of the result. We integrate over the three components of 
the hypothetical missing W boson's momentum, result- 
ing in a six-dimensional integral. In the three-jet case, 
we integrate over the momenta of one of the quarks from 
the W boson decay. 

The numerical integration for the simpler two-jet s- 
and i-channel and Wbb diagrams is performed using 
an adaptation of the CERNLIB routine radmul [ooj . 
This is a deterministic adaptive quadrature method that 
performs well for smaller integrations. For the higher- 
dimensional integrations needed for the three-jet and 
tt matrix elements, a faster integrator is needed. We 
use the DIVONNE algorithm implemented in the CUBA li- 



where Pg = Ps-channd + -Pt-channoi- Each probability is 
multiplied by an arbitrary normalization factor, which is 
chosen to maximize the expected sensitivity. Different 
values are chosen in each 5-tag and jet category in order 
to maximize the sensitivity separately in each. The re- 
sulting templates and distributions are shown for all four 
EPD functions in their respective selected data sam- 
ples in Fig. [M] All of them provide good separation 
between single top quark events and background events. 
The sums of predictions normalized to our signal and 
background models, which are described in Sections |V] 
and IIVI respectively, are compared with the data. Fig- 
ure d^a)] corresponds to the sum of all four 5-tag and jet 
categories. 



brary 'oi'], which uses a Monte-Carlo-based technique of 
stratified sampling over quasi-random numbers to pro- 
duce its answer. 



^. Event Probability Discriminant 

Event probabilities for all processes are calculated for 
each event for both data events and Monte Carlo simu- 
lated events. For each event, we use the event probabili- 
ties as ingredients to build an event probability discrim- 
inant {EPD), a variable for which the distributions of 
signal events and background events are as different as 
possible. Motivated by the Neyman- Pearson lemma [9^ , 
which states that a likelihood ratio is the most sensitive 
variable for separating hypotheses, we define the EPD to 
be EPD — Ps/ {Ps + Ph), where Pg and Pb are estimates 
of the signal and background probabilities, respectively. 
This discriminant is close to zero if Pb ^ Pg and close to 
unity if Pg 3> Pb. There are four EPD functions in all, 
for W^-|-two- or three-jet events with one or two b tags. 

Several background processes in this analysis have no 
b jet in the final state, and the matrix element probabili- 
ties do not include detector-level discrimination between 
b jets and non-6 jets. In order to include this extra infor- 
mation, we define the 6-jet probability as 6 = (&nn + 1)/2 
and use it to weight each matrix element probability by 
the b flavor probability of its jets. Since single top quark 
production always has a b quark in the final state, we 
write the event-probability-discriminant as: 



(15) 

I 

5. Validation 

We validate the performance of the Monte Carlo to 
predict the distribution of each EPD by checking the 
untagged W^-|-jets control samples, setting &nn = 0.5 so 
that it does not affect the EPD. An example is shown in 
Fig. B^fb)] for W-t-two-jet events. The agreement in this 
control sample gives us confidence that the information 
used in this analysis is well modeled by the Monte Carlo 
simulation. 

Because the tt background is the most signal-like of 
the background contributions in this analysis, the ma- 
trix element distribution is specifically checked in the 6- 
tagged four-jet control sample, which is highly enriched 
in tt events. Each EPD function is validated in this way, 
for two or three jets, and one or two b tags, using the 
highest-ii'T jets in W^-|-four-jet events with the appropri- 



EPD= '-^ 

b-{Ps + Pwbb + Ptt) + {l-b)- {Pwc-c + Pwcg + Pwaa) 
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ate number of b tags. An example is shown in Fig. [26l 
for the two-jet one-6-tag EPD function. 



C. Artificial Neural Network 

A different approach uses artificial neural networks 
(NN) to combine sensitive variables to distinguish sin- 
gle top quark signal from background events. As with 
the neural network flavor separator 6nn described in Sec- 
tion IvD the NeuroBayes |83| package is used to create 
the neural networks. We train a different neural network 
in each selected data sample - indexed by the number 
of jets, the number of 6-tagged jets, and whether the 
charged lepton candidate is a triggered lepton or an EMC 
lepton. For all samples, the i-channel Monte Carlo is 
used as the signal training sample except for the two-jet 
two-6-tag events, in which s-channel events are treated 
as signal. The background training sample is a mix of 
Standard Model processes in the ratios of the estimated 
yields given in Tables |T] and |lTl 

Each training starts with more than fifty variables, but 
the training procedure removes those with no significant 
discriminating power, reducing the number to 11-18 vari- 
ables. Each neural network has one hidden layer of 15 
nodes and one output node. 

As in other cases, the transverse momentum of the 
neutrino is inferred from the of the event. The com- 
ponent of the momentum of the neutrino along the beam 
axis is calculated from the assumed mass of the W boson 
and the measured energy and momentum of the charged 
lepton. A quadratic equation in must be solved. If 
there is one real solution, we use it. If there are two real 
solutions, we use the one with the smaller If the 

two solutions are complex, a kinematic fit which varies 

the transverse components of is performed to find a 
solution as close as possible to [111 which results in 
a real . 

If only one jet is ^-tagged, it is assumed to be from 
top quark decay. If there is more than one 6-tagged jet, 
the jet with the largest x -q is chosen. More detailed 
information about this method can be found in 1621. 



1. Input Variables 

The variables used in each network are summarized in 
Table Hill Descriptions of the variables follow. 

• Mi^f,: The reconstructed top quark mass. 

• Miiybi,: The reconstructed mass of the charged lep- 
ton, the neutrino, and the two 6-tagged jets in the 
event. 

• M^'^^: The transverse mass of the reconstructed top 
quark. 



TABLE III: Summary of variables used in the different neural 
networks in this analysis. An explanation of the variables is 
given in the text. 



2-jet 3- jet 



Variable 


1-tag 


2-tag 


1-tag 


2-tai 




X 


X 


X 








X 




X 




X 


X 


X 


X 


Mjj 


X 


X 


X 


X 




X 


X 










X 


X 












X 








X 


X 




X 






X 


Pt 


X 














X 


X 


Ht 


X 




X 








X 












X 




COS 6ij 


X 




X 


X 


cos 6^ 


X 








cos 


X 








cos O^jj 




X 




X 


Q X ri 


X 




X 


X 


Vi 




X 






rjw 


X 


X 








X 




X 










X 


X 


A'7t, light 






X 




V~s 








X 


Centrality 








X 


Jet flavor separator 


X 


X 


X 





• Mjj: The invariant mass of the two jets. In the 
three-jet networks, all combinations of jets are in- 
cluded as variables. 

• : The transverse mass of the reconstructed W 
boson. 

• E!^"" : The transverse energy of the b quark from 
top decay. 

• : The transverse energy of the b quark not 
from top decay. 

• ■ sum of the transverse energies of the 
two most energetic jets. In the three-jet one-tag 
network, all combinations of two jets are used to 
construct separate ^ Ei!^ input variables. 

• e;^'^'**: The transverse energy of the untagged or 
lowest-energy jet. 
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FIG. 24: Templates of predictions for the signal and background processes, each scaled to unit area (left) and comparisons of 
the data with the sum of the predictions (right) of the ME discriminant EPD for each selected data sample. Single top quark 
events are predominantly found on the right-hand sides of the histograms while background events are mostly found on the 
left-hand sides. The data are indicated by points with error bars, and the predictions are shown stacked, with the stacking 
order following that of the legend. 
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FIG. 25: Comparison of the data with the sum of the predictions of the matrix element discriminant for the sum of all selected 
data samples (left). The discriminant output for two-jet one-fe-tag events applied to the untagged W^+two jets control sample 
(right) shows that the Monte Carlo W+two jets samples model the ME distribution of the data well. The data are indicated 
by points with error bars, and the predictions are shown stacked, with the stacking order following that of the legend. 



W + 4 Jets, >1 b Tag 




0.2 0.4 0.6 0.8 1 

ME Discriminant 



FIG. 26: The event probability discriminant for two-jet one- 
6-tag events applied to the fe-tagged VF-l-four jets control sam- 
ple, showing that the Monte Carlo ti samples model the EPD 
distribution of the data well. The data are indicated by points 
with error bars, and the predictions are shown stacked, with 
the stacking order following that of the legend. 



• p^^ : The transverse momentum of the charged lep- 
ton. 

• p^-'-': The magnitude of the vector sum of the 
transverse momentum of the charged lepton, the 
neutrino, and all the jets in the event. 

• Ht: The scalar sum of the transverse energies of 
the charged lepton, the neutrino, and all the jets in 
the event. 

• The missing transverse energy. 

• ^-p.sig- The significance of the missing transverse 
energy ^-s defined in Equation 21 



• cos Oij : The cosine of the angle between the charged 
lepton and the untagged or lowest-energy jet in the 
top quark's reference frame. 

• cos 6^: The cosine of the angle between the 
charged lepton and the reconstructed W boson in 
the W boson's reference frame. 

• cos6'|^y: The cosine of the angle between the 
charged lepton and the reconstructed W boson in 
the top quark's reference frame. 

• cos 6 J J-. The cosine of the angle between the two 
most energetic jets in the top quark's reference 
frame. 

• Q X Tj: The charge of the lepton multiplied by the 
pseudorapidity of the untagged jet. 

• rji: The pseudorapidity of the charged lepton. 

• rjw- The pseudorapidity of the reconstructed W 
boson. 

• The sum of the pseudorapidities of all jets. 

• ^Tjjj: The difference in pseudorapidity of the two 
most energetic jets. In the three-jet two-tag net- 
work, the difference between the two least energetic 
jets is also used. 

• A?7tjight: The difference in pseudorapidity between 
the untagged or lowest-energy jet and the recon- 
structed top quark. 

• \/J: The energy of the center-of-mass system of the 
hard interaction, defined as the ivb system plus the 
recoiling jet. 
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• Centrality: The sum of the transverse energies of 
the two leading jets divided by ^/l. 

• &nn: The jet flavor separator neural network out- 
put described in Section |Vll For two-tag events, 
the sum of the two outputs is used. 

2. Distributions 

In each data sample, distinguished by the number of 
identified jets and the number of b tags, a neural network 
is constructed with the input variables described above. 
The outputs lie between —1.0 and -1-1.0, where —1.0 is 
background-like and -1-1.0 is signal-like. The predicted 
distributions of the signals and the expected background 
processes are shown in Fig. [27] for the four 6-tag and jet 
categories. The templates, each normalized to unit area, 
are shown separately, indicating the separation power for 
the small signal. The sums of predictions normalized to 
our signal and background models, which are described 
in Sections |V] and IIVI respectively, are compared with 
the data. Figure [S^a)] corresponds to the sum of all four 
6-tag and jet categories. 

3. Validation 

The distributions of the input variables to each neu- 
ral network are checked in the zero, one, and two-tag 
samples for two- and three-jet events. Comparisons of 
the observed and predicted distributions of some of the 
variables which confer the most sensitivity are shown in 
Sections IV El and I VIII The good agreement seen between 
the predictions and the observations in both the input 
variables and the output variables gives us confidence 
in the Monte Carlo modeling of the output discriminant 
distributions. 

We validate the performance of each network by check- 
ing it in the untagged sideband, appropriately modifying 
variables that depend on tagging information. An exam- 
ple is shown in Fig. I2^t^b)[ The agreement in this side- 
band gives us confidence that the information used in this 
analysis is well modeled by the Monte Carlo simulation. 

4- High NN Discriminant Output 

To achieve confidence in the quality of the signal con- 
tribution in the highly signal-enriched region of the NN 
discriminant, further studies have been conducted. By 
requiring a NN discriminant output above 0.4 in the event 
sample with 2 jets and 1 b tag, a signal-to-background 
ratio of about 1:3 is achieved. This subsample of signal 
candidates is expected to be highly enriched with sig- 
nal candidates and is simultaneously sufficient in size to 
check the Monte Carlo modeling of the data. We com- 
pare the expectations of the signal and background pro- 



cesses to the observed data of this subsample in various 
highly discriminating variables. The agreement is good, 
as is shown, for example, for the invariant mass of the 
charged lepton, the neutrino, and the 5-tagged jet Algi^t in 
Fig. [29l a). Since only very signal-like background events 
are within this subsample, the background shapes are 
very similar to the signal shapes. This is because the 
Mii,b is one of the most important input variables of the 
NN discriminant, leading to a signal-like sculpted shape 
for background events in this subsample. As a conse- 
quence, the shape of this distribution does not carry in- 
formation as to whether a signal is present or absent. 

To overcome the similar shapes of signal and back- 
ground events in the signal-enriched subsample, a special 
neural network discriminant (NN') is constructed in ex- 
actly the same way as the original, but without Mii^h as 
an input. Since M^^;, is highly correlated with other orig- 
inal neural network input variables, such as (with 
a correlation coefficient of 65%), iJx (45%), and Mjj 
(24%), these variables are also omitted for the training of 
the special NN' discriminant. Despite the loss of discrimi- 
nation through the removal of some very important input 
variables, the NN' discriminant is still powerful enough 
to enrich a subsample of events with signal. With the 
requirement NN' > 0.4, the signal-to-background ratio is 
somewhat reduced compared with that of the original NN 
discriminant. The benefit of this selection is that the pre- 
dicted distributions of the signal and background are now 
more different from each other. We predict that back- 
ground events are dominant at lower values of Mg^}, while 
the single top quark signal is concentrated around the re- 
constructed top quark mass of 175 GeV/c^, as shown in 
Fig. I^Wb). Because of the more distinct shapes of the 
signal and background expectations, the observed shape 
of the in data distribution is no longer explicable by the 
background prediction alone; a substantial amount of sig- 
nal events is needed to describe the observed distribution. 
The NN' network is used only for this cross-check; it is 
not included in the main results of this paper. 



D. Boosted Decision Tree 

A decision tree classifies events with a series of binary 
choices; each choice is based on a single variable. Each 
node in the tree splits the sample into two subsamples, 
and a decision tree is built using those two subsamples, 
continuing until the number of events used to predict the 
signal and background in a node drops below a set mini- 
mum. In constructing a tree, for each node, the variable 
used to split the node's data into subsamples and the 
value of the variable on the boundary of the two subsam- 
ples are chosen to provide optimal separation between 
signal and background events. The same variable may 
be used in multiple nodes, and some variables may not 
be used at all. This procedure results in a series of fi- 
nal nodes with maximally different signal-to-background 
ratios. 
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FIG. 27: Templates of predictions for the signal and background processes, each scaled to unit area (left) and comparisons of 
the data with the sum of the predictions (right) of the neural network output for each signal region. Single top quark events 
are predominantly found on the right-hand sides of the histograms while background events are mostly found on the left-hand 
sides. The data are indicated by points with error bars, and the predictions are shown stacked, with the stacking order following 
that of the legend. 
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FIG. 28: Comparison of the data with the sum of the predictions of the neural network output for the sum of all selected signal 
data samples (left) and the neural network output for two-jet one-b-tag events applied to the untagged control sample, showing 
close modeling of the data and good control over the VF+light-flavor shape. The data are indicated by points with error bars, 
and the predictions are shown stacked, with the stacking order following that of the legend. 
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FIG. 29: Comparison of the predictions and the data for Ali^b for events with an output above 0.4 of the original NN (left) and 
a specially trained NN' (right) discriminant. The data are indicated by points with error bars, and the predictions are shown 
stacked, with the stacking order following that of the legend. 



Decision trees allow many input variables to be com- 
bined into a single output variable virith powerful discrim- 
ination between signal and background. Additionally, 
decision trees are insensitive to the inclusion of poorly 
discriminating input variables because the training al- 
gorithm will not use non-discriminating variables when 
constructing its nodes. In this analysis, we train a differ- 
ent boosted decision tree (BDT) in each data sample. We 
use the TMVA [93| package to perform this analysis [11] . 
The boosting procedure is described below. 

The criterion used to choose the variable used to split 
each node's data and to set the value of the variable on 
the boundary is to optimize the Gini index [q^ p{l—p) = 
sb/{s + b)'^, where p = s/{s + b) is the purity and s and 
b are the number of signal and background events in the 



node, respectively. 

A shortcoming of decision trees is their instability with 
respect to statistical fluctuations in the training sample 
from which the tree structure is derived. For example, 
if two input variables exhibit similar separation power, 
a fluctuation in the training sample may cause the algo- 
rithm to decide to use one variable early in the decision 
chain, while a slightly different training sample may re- 
sult in a tree which uses the other variable in its place, 
resulting in a substantially different tree. 



This problem is overcome by a boosting |97| procedure 
that extends this concept from one tree to several trees 
which form a "forest" of decision trees. The trees are 
derived from the same training ensemble by reweighting 
events, and are finally combined into a single classifier 



43 



which is given by a weighted average of the individual 
decision trees. Boosting stabiUzes the response of the de- 
cision trees with respect to fluctuations in the training 
sample and is able to considerably enhance the perfor- 
mance with respect to a single tree. 

This analysis uses the ADABOOST ^ (adaptive boost) 
algorithm, in which the events that were misclassified in 
one tree are multiplied by a common boost weight a in 
the training of the next tree. The boost weight is derived 
from the fraction of misclassified events, r, of the previous 
tree, 

1 - r 

a= . (16) 

r 

The resulting event classification yBBTix) for the 
boosted tree is given by 

UBBTix) = ^ In(ai) • hi{x), (17) 

iGforcst 

where the sum is over all trees in the forest. Large (small) 
values of yBDT(a;) indicate a signal-like (background-like) 
event. The result hi{x) of an individual tree can either be 
defined to be -t-1 (—1) for events ending up in a signal- like 
(background-like) leaf node according to the majority of 
training events in that leaf, or hi(x) can be defined as the 
purity of the leaf node in which the event is found. We 
found that the latter option performs better for single- 
tag samples, while the double tag samples-which have 
fewer events-perform better when trained with the for- 
mer option. 

While non-overlapping samples of Monte Carlo events 
are used to train the trees and to produce predictions of 
the distributions of their outputs, there is the possibility 
of "over-training" the trees. If insufficient Monte Carlo 
events are classified in a node of a tree, then the train- 
ing procedure can falsely optimize to separate the few 
events it has in the training sample and perform worse 
on a statistically independent testing sample. In order 
to remove statistically insignificant nodes from each tree 
we employ the cost complexity [l^ pruning algorithm. 
Pruning is the process of cutting back a tree from the 
bottom up after it has been built to its maximum size. 
Its purpose is to remove statistically insignificant nodes 
and thus reduce the over-training of the tree. 

The background processes included in the training are 
tt and Wbb for double-6-tag channels, and those as well as 
Wc and M^-KLF for the single-6-tag channels. Including 
the non-dominant background processes is not found to 
significantly increase the performance of the analysis. 

1. Distributions 

In each data sample, distinguished by the number of 
identified jets and the number of b tags, a BDT is con- 
structed with the input variables described above. The 
output for each event lies between —1.0 and 1.0, where 



— 1.0 indicates the event has properties that make it ap- 
pear much more to be a background event than a signal 
event, and 1.0 indicates the event appears much more 
likely to have come from a single top signal. The pre- 
dicted distributions of the signals and the expected back- 
ground processes are shown in Fig. [30] for the four b-tag 
and jet categories. The templates, each normalized to 
unit area, are shown separately, indicating the separa- 
tion power for the small signal. The sums of predictions 
normalized to our signal and background models, which 
are described in Sections IVland llVl respectively, are com- 
pared with the data. Figure [5"]|[a) [ corresponds to the sum 
of all four 6-tag and jet categories. 

2. Validation 

The distributions of the input variables to each BDT 
are checked in the zero, one, and two 6-tag samples for 
two- and three-jet events, and also in the four-jet sample 
containing events with at least one b tag. Some of the 
most important variables' validation plots are shown in 
Sections IV El and I VIII The good agreement seen between 
the predictions and the observations in both the input 
variables and the output variables gives us confidence 
in the Monte Carlo modeling of the distributions of the 
discriminant outputs. 

We validate the modeling of the backgrounds in each 
boosted tree by checking it in the sample of events with 
no b tags, separately for events with two and three jets. 
For variables depending on 6-tagging information like 
M^i^b and Q x rj, the leading jet is chosen as the "6- 
tagged" jet, and for the &nn variable the output value 
is randomly taken from a W^+LF template. An example 
is shown in Fig. I3]|l^b)[ which shows the two-jet, one b- 
tag BDT tested with the two-jet, zero 6-tag sample. The 
dominant source of background tested in Fig. I5]p[b)| is 
W-I-LF, and the ALPGEN Monte Carlo predicts the BDT 
output very well. We further test the four-jet sample with 
one or more &-tags, shown in Fig. 1321 taking the leading 
two jets to test the two-jet, one 6-tag BDT. The domi- 
nant background in this test is tt, and the good modeling 
of the distribution of the output of the BDT by pythia 
raises our confidence that this background, too, is mod- 
eled well in the data samples. 



VIII. SYSTEMATIC UNCERTAINTIES 

The search for single top quark production and the 
measurement of the cross section require substantial in- 
put from theoretical models, Monte Carlo simulations, 
and extrapolations from control samples in data. We as- 
sign systematic uncertainties to our predictions and in- 
clude the effects of these uncertainties on the measured 
cross sections as well as the significance of the signal. 

We consider three categories of systematic uncertainty: 
uncertainty in the predicted rates of the signal and back- 
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FIG. 30: Templates of predictions for the signal and background processes, each scaled to unit area (left) and comparisons of 
the data with the sum of the predictions (right) of the boosted decision tree output for each data sample. Single top quark 
events are predominantly found on the right-hand sides of the histograms while background events are mostly found on the 
left-hand sides. The data are indicated by points with error bars, and the predictions are shown stacked, with the stacking 
order following that of the legend. 
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FIG. 31: Comparison of the data with the sum of the predictions of the BDT output for the sum of all selected data samples 
(left) and the BDT output for two-jet one-6-tag events applied to the untagged two-jet control sample (right), where the 
dominant contributing process is W-l-light-flavored jets. The data are indicated by points with error bars, and the predictions 
are shown stacked, with the stacking order following that of the legend. 
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FIG. 32: The BDT output for four-jet events containing one 
or more b tags. The dominant source of background is tt 
events. The data are indicated with points and the stacked 
histograms show the prediction, scaled to the total data rate, 
with the stacking order following that of the legend. 



ground processes, uncertainty in the shapes of the dis- 
tributions of the discriminant variables, and uncertainty 
arising from the hmited number of Monte Carlo events 
used to predict the signal and background expectations in 
each bin of each discriminant distribution. Sources of un- 
certainty may affect multiple signal and background com- 
ponents. The effects of systematic uncertainty from the 
same source are considered to be fully correlated. For ex- 
ample, the integrated luminosity estimate affects the pre- 
dictions of the Monte-Carlo based background processes 
and the signal, so the uncertainty on the integrated lu- 
minosity affects all of these processes in a correlated way. 
The effects of different sources of systematic uncertainty 
are considered to be uncorrelated. 



The effects of all systematic uncertainties are included 
in the hypothesis tests and cross section measurements 
performed by each analysis, as described in Section HXl 
Detailed descriptions of the sources of uncertainty and 
their estimation are given below. 

A. Rate Uncertainties 

Rate uncertainties affect the expected contributions of 
the signal and background samples. Some sources have 
asymmetric uncertainties. All rate uncertainties are as- 
signed truncated Gaussian priors, where the truncation 
prevents predictions from being negative for any source 
of signal or background. The sources of rate uncertainties 
in this analysis are described below, and their impacts on 
the signal and background predictions are summarized in 
Table lYl 

• Integrated Luminosity: A symmetric uncer- 
tainty of ±6% is applied to all Monte-Carlo based 
predictions. This uncertainty includes the uncer- 
tainty in the pp inelastic cross section as well as 
the uncertainty in the acceptance of CDF's lumi- 
nosity monitor The requirement that the pri- 
mary vertex position in z is within ±60 cm of the 
origin causes a small acceptance uncertainty that 
is included as well. 

• Theoretical Cross Sections: Our MC-based 
background processes are scaled to theoretical pre- 
dictions at NLO (or better). We apply the associ- 
ated theoretical uncertainties. We separate out the 
effects of the top quark mass from the other sources 
of uncertainty affecting the theoretical predictions. 
Not every theoretical cross section uncertainty is 
used in each result; details are given in Section HXl 
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Monte Carlo Generator: Different Monte Carlo 
generators for the signal result in different accep- 
tances. The deviations are small but are still in- 
cluded as a rate uncertainty on the signal expecta- 
tion as described in Section HVl 

Acceptance and Efficiency Scale Factors: The 

predicted rates of the Monte Carlo background pro- 
cesses and of the signals are affected by trigger ef- 
ficiency, mismodeling of the lepton identification 
probability, and the 6-tagging efhciency. Known 
differences between the data and the simulation are 
corrected for by scaling the prediction, and uncer- 
tainties on these scale factors are collected together 
in one source of uncertainty since they affect the 
predictions in the same way. 

Heavy Flavor Fraction in W^+jets: The pre- 
diction of the Wbb, Wcc, and Wc fractions in the 
W + 2 jets and 14^-1-3 jets samples are extrapo- 
lated from the -I- 1 jet sample as described in 
Section |Vl It is found that ALPGEN underpredicts 
the Wbb and Wcc fractions in the -|- 1 jet sam- 
ple by a factor of 1.4 ± 0.4. We assume that the 
Wbb and Wcc predictions are correlated. The un- 
certainty on this scale factor comes from the spread 
in the measured heavy-flavor fractions using differ- 
ent variables to fit the data, and in the difference 
between the Wbb and Wcc scale factors. The Wc 
prediction from ALPGEN is compared with CDF's 
measurement [8l[ and is found not to require scal- 
ing, but a separate, uncorrelated uncertainty is as- 
signed to the Wc prediction, with the same relative 
magnitude as the Wbb+Wcc uncertainty. 

Mistag Estimate: The method for estimating the 
yield of events with incorrectly 6-tagged events is 
described in Section IV Dl The largest source of 
systematic uncertainty in this estimate comes from 
extrapolating from the negative tag rate in the data 
to positive tags by estimating the asymmetry be- 
tween positive light-flavor tags and negative light- 
flavor tags. Other sources of uncertainty come from 
differences in the negative tag rates of different data 
samples used to construct the mistag matrix. 

Non-H^ Multijet Estimate: The Non-W rate 
prediction varies when the distribution is con- 
structed with a different number of bins or if differ- 
ent models are used for the Non-M^ templates. The 
.^T fits also suffer from small data samples, par- 
ticularly in the double-tagged samples. A relative 
uncertainty of ±40% is assesed on all Non-T4^ rate 
predictions. 

Initial State Radiation (ISR): The model 
used for ISR is pythia's "backwards evolution" 
method [l^l . This uncertainty is evaluated by gen- 
erating new Monte Carlo samples for tt and single 
top quark signals with Aqcd doubled or divided in 



half, to generate samples with more ISR and less 
ISR, respectively. Simultaneously, the initial trans- 
verse momentum scale is multiplied by four or di- 
vided by four, and the hard scattering scale of the 
shower is multiplied by four or divided by four, for 
more ISR and less ISR, respectively. These vari- 
ations are chosen by comparing Drell-Yan Monte 
Carlo and data samples. The distributions of 
dileptons are compared as a function of the dilepton 
invariant mass, and the ISR more/less prescriptions 
generously bracket the available data [9^. Since 
the ISR prediction must be extrapolated from the 
Z mass scale to the higher-Q^ scales of tt and sin- 
gle top quark events, the variation chosen is much 
more than is needed to bracket the data. 

• Final State Radiation (FSR): pythia's model 
of gluon radiation from partons emitted from the 
hard-scattering interaction has been tuned with 
high precision to LEP data [H^. Nonetheless, un- 
certainty remains in the radiation from beam rem- 
nants, and parameters analogous to those adjusted 
for ISR are adjusted in pythia for the final-state 
showering, except for the hard-scattering scale pa- 
rameter. The effects of variations in ISR and FSR 
are treated as 100% correlated with each other. 
ISR and FSR rate uncertainties are not evaluated 
for the W-l-jets Monte Carlo samples because the 
rates are scaled to data-driven estimates with asso- 
ciated uncertainties, and the kinematic shapes of all 
predictions have factorization and renormalization 
scale uncertainties applied, as discussed below. 

• Jet Energy Scale (JES): The calibration of the 
calorimeter response to jets is a multi-step pro- 
cess, and each step involves an uncertainty which is 
propagated to the final jet-energy scale [4^. Raw 
measurements of the jet energies are corrected ac- 
cording to test beam calibrations, detector non- 
uniformity, multiple interactions, and energy that 
is not assigned to the jet because it lies outside of 
the jet cone. The uncertainties in the jet energy 
scale are incorporated by processing all events in 
all Monte Carlo samples with the jet energy scale 
varied upwards and again downwards. The kine- 
matic properties of each event are affected, and 
some events are re-categorized as having a differ- 
ent number of jets as jets change their Et inducing 
correlated rate and shape uncertainties. An exam- 
ple of the shape uncertainty to the NN analysis's 
discriminant is shown in Fig. 1331 

• Parton Distribution Functions (PDF): The 

PDFs used in this analysis are the CTEQ5L set of 
leading-order PDFs [5l|. To evaluate the system- 
atic uncertainties on the rates due to uncertainties 
in these PDFs, we add in quadrature the differ- 
ences between the predictions of the following pairs 
of PDFs: 



W 4- 2 Jets, 1 b Tag 
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FIG. 33: An example of systematically shifted shape tem- 
plates. This figure shows the jet energy scale shifted his- 
tograms for the single top quark signal in two-jet one-fe-tag 
events for the NN discriminant. The plot below shows the 
relative difference between the central shape and the two al- 
ternate shapes. 

- CTEQ5L and MRST72 [lO^, PDF sets com- 
puted by different groups. MRST72 is also a 
leading-order PDF set. 

- MRST72 and MRST75, which differ in their 
value of as- The former uses 0.1125; the latter 
uses 0.1175. 

- CTEQ6L and CTEQ6L1, of which the former 
has a 1-loop as correction, and the latter has 
a 2-loop Us correction. 

- The 20 signed eigenvectors of CTEQ6M, each 
compared with the defauff CTEQ5L PDFs. 

The PDF uncertainty induces a correlated rate and 
shape uncertainty in the applicable templates. 

B. Shape-Only Uncertainties 

Many of the sources of rate uncertainty listed above 
also induce distortions in the shapes of the templates for 
the signals and background processes used to model the 
data. These include ISR, FSR, JES, and PDF uncertain- 
ties. Here we list the sources of shape uncertainties which 
do not have associated rate uncertainties. 

Shape uncertainty templates are all smoothed with a 
median smoothing algorithm. This procedure takes the 
ratio of the systematically shifted histograms to the cen- 
tral histograms and replaces the contents of each bin with 
the median of the ratios of a five-bin window around the 
bin. The first two bins and the last two bins are left unaf- 
fected by this procedure. The five-bin window was chosen 
as the minimum size that provides adequate smoothing, 
as judged from many shape variation ratio histograms. 
The smoothed ratio histograms are then multiplied by 
the central histograms to obtain the new varied template 
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histograms. This procedure reduces the impact of lim- 
ited Monte Carlo statistics in the bins of the central and 
varied templates. 

• Jet Flavor Separator Modeling: The distribu- 
tion of 6nn for light-flavor jets is found to require 
a small correction, as described in Section IVIl The 
full difference between the uncorrected light-fiavor 
Monte Carlo prediction and the data-derived cor- 
rected distribution is taken as a one-sided system- 
atic uncertainty. Since a pure sample of charm jets 
is not available in the data, a systematic uncer- 
tainty is also assessed on the shape of the charm 
prediction, taking the difference between the dis- 
tribution predicted by the Monte Carlo simulation 
and the Monte Carlo distribution altered by the 
light-flavor correction function. These shifts in the 
distributions of 6nn for these samples are propa- 
gated through to the predictions of the shapes of 
the corresponding discriminant output histograms. 

• Mistag Model: To cover uncertainty in modeling 
the shape of the analysis discriminant output his- 
tograms for mistagged events, the untagged data, 
weighted by the mistag matrix weights, are used to 
make an alternate shape template for the mistags. 
The untagged data largely consist of VF-|-light fla- 
vored jets, but there is a contamination from Whb^ 
Wcc, tt, and even single top quark signal events, 
making the estimate of the systematic uncertainty 
conservative. 

• Factorization and Renormalization Scale: 

Because ALPGEN performs fixed-order calculations 
to create W^-l-jets diagrams, it requires factoriza- 
tion and renormalization scales as inputs. Both of 
these scales are set for each event in our ALPGEN 
samples to 



m^, (18) 

y partons 

where = + p'^/c^ is the transverse mass of 
the generated parton. For light partons, u, d, s, g, 
the mass m is approximately zero; rub is set to 
4.7 GeV/c2 and is set to 1.5 GeV/c^. The 
sum is over all final-state partons excluding the W 
boson decay products. In addition, ALPGEN eval- 
uates as separately at each gqq and ggg vertex, 
and the scale at which this is done is set to the 
transverse momentum of the vertex. The three 
scales are halved and doubled together in order 
to produce templates that cover the scale uncer- 
tainty. Although ALPGEN 's M^-|-heavy-fiavor cross 
section predictions are strongly dependent on the 
input scales, we do not assign additional rate uncer- 
tainties on the W^-|-heavy flavor yields because we 
do not use ALPGEN to predict rates; the yields are 
calibrated using the data. We do not consider the 
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calibrations of these yields to constrain the values 
of the scales for purposes of estimating the shape 
uncertainty; we prefer to take the customary vari- 
ation described above. 

• Non- W Flavor Composition: The distribution 
of 6nn is used to fit the flavor fractions in the low- 
(rj, control samples in order to estimate the central 
predictions of the flavor composition of 6-tagged 
jets in non-VF events, as described in Section IVll 
The limited statistical precision of these fits and 
the necessity of extrapolating to the higher- ^-p sig- 
nal region motivates an uncertainty on the flavor 
composition. The central predictions for the flavor 
composition are 45% b jets, 40% c jets, and 15% 
light-flavored jets. The "worst-case" variation of 
the flavor composition is 60% b jets, 30% c jets, 
and 10% light-flavor jets, which we use to set our 
uncertainty. The predictions of the yields are un- 
changed by this uncertainty, but the distribution of 
&NN is varied in a correlated way for each analysis, 
and propagated to the predictions of the discrimi- 
nant output histograms. 

• Jet ?7 Distribution: Checks of the untagged -1-2 
jet control region show that the rate of appear- 
ance of jets at high \r]\ in the data is underesti- 
mated by the prediction (Fig. [M] (a)). Inaccurate 
modeling of the distribution of this variable has a 
potentially significant impact on the analysis be- 
cause of use of the sensitive variable Qxrj, which is 
highly discriminating for events with jets at large 
\ri\. Three explanations for the discrepancies be- 
tween data and MC are possible — beam halo over- 
lapping with real W-|-jets events, miscalibration of 
the jet energy scale in the forward calorimeters, and 
ALPGEN mismodeling. We cannot distinguish be- 
tween these possibilities with the data, and thus 
choose to reweight all Monte Carlo samples by a 
weighting factor based on the ratio of the data and 
Monte Carlo in the untagged sideband, to make al- 
ternate shape templates for the discriminants for 
all Monte Carlo samples. No corresponding rate 
uncertainty is applied. 

• Jet AR Distributio n: Similarly, th e distribu- 
tion of AR{ji,j2) — \/ (Ar/)2 -(- (A(?!))2, a measure 
of the angular separation between two jets, is found 
to be mismodeled in the untagged control sample 
(Fig. [34] (b)). Modeling this distribution correctly 
is important because of the use of the input variable 
Mjj, which is highly correlated with Ai?(ji,j2) in 
our discriminants. The mismodeling of A_R(ji,j2) 
is believed to be due to the gluon splitting fraction 
in ALPGEN, but since this conclusion is not fully 
supported, we take as a systematic uncertainty the 
difference in predictions of all Monte Carlo based 
templates after reweighting them using the ratio of 
the untagged data to the prediction. 



IX. INTERPRETATION 

The analyses presented in this paper have two goals: to 
evaluate the significance of the excess of events compared 
with the background prediction, and to make a precise 
measurement of the cross section. These goals have much 
in common: better separation of signal events from back- 
ground events and the reduction of uncertainties help im- 
prove both the cross section measurements and the ex- 
pected significance if a signal is truly present. But there 
are also differences. For example, the systematic uncer- 
tainty on the signal acceptance affects the precision of 
the cross section measurement, but it has almost no ef- 
fect on the observed significance level, and only a minor 
effect on the predicted significance level; Section IIX Dl 
discusses this point in more detail. More importantly, 
a precision cross section measurement relies most on in- 
creasing acceptance and understanding the background 
in a larger sample. The significance of an excess, however, 
can be much larger if one bin in an analysis has a very low 
expected background yield and has data in it that are in- 
compatible with that background, even though that bin 
may not contribute much information to the cross section 
measurement. 

The contents of the low signal-to-background bins are 
important for the proper interpretation of the high signal- 
to-background bins. They serve as signal-depleted con- 
trol samples which can be used to help constrain the 
background predictions. Not all bins are fully depleted 
in signal, and the signal-to-background ratio varies from 
very small to about 2:1 in some analyses. Simultane- 
ous use of all bins' contents, comparing the observations 
to the predictions, is needed to optimally measure the 
cross section and to compute the significance. System- 
atic uncertainties on the predicted rates and shapes of 
each component of the background and the two signals 
(s-channel and ^-channel), and also bin- by-bin systematic 
uncertainties, affect the extrapolation of the background 
fits to the signal regions. 

These considerations are addressed below, and the pro- 
cedures for measuring the cross section and the signifi- 
cance of the excess are performed separately. The han- 
dling of the systematic uncertainties is Bayesian, in that 
priors are assigned for the values of the uncertain nui- 
sance parameters, the impacts of the nuisance parame- 
ters on the predictions are evaluated, and integrals are 
performed as described below over the values of the nui- 
sance parameters. 



A. Likelihood Function 

The likelihood function we use in the extraction of the 
cross section and in the determination of the significance 
is the product of Poisson probabilities for each bin in 
each histogram of the discriminant output variable of 
each channel. Here, the channels are the non-overlapping 
data samples defined by the number of jets, the number 
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FIG. 34: Graphs showing the poor modehng of the second jet pseudorapidity and the distance between the two jets in the rj-ff) 
plane. These are accounted for with systematic uncertainties on the shapes of the W^+jets predictions. The data are indicated 
by points with error bars, and the predictions are shown stacked, with the stacking order following that of the legend. 



TABLE IV: Sources of systematic uncertainty considered in this analysis. Some uncertainties are listed as ranges, as the 
impacts of the uncertain parameters depend on the numbers of jets and b tags, and which signal or background component is 
predicted. Sources listed below the double line are used only in the calculation of the p-value. 



Source of Uncertainty 


Rate 


Shape 


Processes affected 


Jet energy scale 


0-16% 


X 


all 


Initial state radiation 


0-11% 


X 


single top, tt 


Final state radiation 


0-15% 


X 


single top, ti 


Parton distribution functions 


2-3% 


X 


single top, ti 


Acceptance and efficiency scale factors 


0-9% 




single top, tt, diboson, Z/7*+jets 


Luminosity 


6% 




single top, ti, diboson, Z/7*+jets 


Jet flavor separator 




X 


all 


Mistag model 




X 


H/+light 


Non-Vl^ model 




X 


Non-VK 


Factorization and renormalization scale 




X 


Wbb 


Jet 77 distribution 




X 


all 


Jet AR distribution 




X 


all 


Non-H^ normalization 


40% 




Non-W 


Wbb and Wcc normalization 


30% 




Wbb, Wcc 


Wc normalization 


30% 




Wc 


Mistag normalization 


17-29% 




VF+light 


tt normalization 


12% 




ti 


Monte Carlo generator 


1-5% 




single top 


Single top normalization 


12% 




single top 


Top mass 


2-12% 


X 


single top, ti 



of b tags, and whether the charged lepton candidate is a 
triggered electron or muon, or whether it was an extended 
muon coverage candidate event. We do not simply add 
the distributions of the discriminants in these very dif- 
ferent samples because doing so would collect bins with 
a higher signal purity with those of lower signal purity. 



diluting our sensitivity. The Poisson probabilities are 
functions of the number of observed data events in each 
bin di and the predictions in each bin where i ranges 
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from 1 to nbins- The likelihood function is given by 



n 



(19) 



The prediction in each bin is a sum over signal and back- 
ground contributions: 



fe=i 



fe=i 



(20) 



where bik is the background prediction in bin i for back- 
ground source k; nbkg is the total number of background 
contributions. The signal is the sum of the s-channel and 

f-channel contributions; Tigig 2 is the number of signal 

sources, and the Sik are their predicted yields in each 
bin. The predictions bik and Sik depend on Unuis uncer- 
tain nuisance parameters 0m, where m = l...nnuis, one 
for each independent source of systematic uncertainty. 
These nuisance parameters are given Gaussian priors cen- 
tered on zero with unit width, and their impacts on the 
signal and background predictions are described in the 
steps below. 

In the discussion below, the procedure for applying sys- 
tematic shifts to the signal and background predictions 
is given step by step, for each kind of systematic uncer- 
tainty. Shape uncertainties arc applied first, then bin- 
by-bin uncertainties, and finally rate uncertainties. The 
bin-by-bin uncertainties arise from limited Monte Carlo 
(or data from a control sample) statistics and are taken 
to be independent of each other and all other sources of 
systematic imcertainty. The steps arc labeled 6*^ for the 
central, unvaried background prediction in each bin, and 
6* for the prediction with all systematic uncertainties ap- 
plied. 

The contribution to a bin's prediction from a given 
source of shape uncertainty is modified by linearly in- 
terpolating and extrapolating the difference between the 
central prediction b^^ and the prediction in a histogram 
corresponding to a +la variation if > Oj and 
performing a similar operation using a. —la varied his- 
togram if < 0: 



> 
< 



(21) 



The parameter list is shared between the signal and back- 
ground predictions because some sources of systematic 
uncertainty affect both in a correlated way. The appli- 
cation of shape uncertainties is not allowed to produce 
a negative prediction in any bin for any source of back- 
ground or signal: 



b% = ma.x{0,blk). 



(22) 



Each template histogram, including the systematically 
varied histograms, has a statistical uncertainty in each 
bin. These bin-by-bin uncertainties are linearly interpo- 
lated in each bin in the same way as the predicted values. 
This procedure works well when the shape- variation tem- 
plates share all or most of the same events, but it overes- 
timates the bin-by-bin uncertainties when the alternate 
shape templates are filled with independent samples. If 
the bin- by-bin uncertainty on 6°^, is (5^^^,, and the bin- by- 
bin uncertainty on is S^^, then 



"b,ik "b,ik ' 2—i \ /XO 

TO=1 I \°b,ik \ik)'^" 
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(23) 

Each bin of each background has a nuisance parameter 
rjb^ik associated with it. 
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(24) 



where rjb^ik is drawn from a Gaussian centered on zero 
with unit width when integrating over it. If 6?j. < 0, then 
rib,ik is re-drawn from that Gaussian. 

Finally, rate uncertainties are applied multiplicatively. 
If the fractional uncertainty on b^f, due to nuisance pa- 
rameter TO is for a +la variation and it is p]^^ 
for a negative variation, then a quadratic function is de- 
termined to make a smooth application of the nuisance 
parameter to the predicted value: 



•^ik 



b' 



ik 



n 

m=l 



^ -I- n™^ /)'"+ — n™^ 
, , h'b,ik ^ fb,ik „2 , '^b,ik l^b,ik , 
J- -\ ^ H 1 



(25) 



The rate uncertainties are applied multiplicatively be- 
cause most of them affect the rates by scale factors, such 

as the luminosity and acceptance uncertainties, and they 
are applied last because they affect the distorted shapes 



in the same way as the undistorted shapes. Multiple 
shape uncertainties are treated additively because most 
of them correspond to events migrating from one bin to 
another. 
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The signal predictions are based on their Standard 
Model rates. These are scaled to test other values of 
the single top quark production cross sections: 

Sik = s^kPk, (26) 

where (3s scales the s-channel signal and fit scales the 
t-channel signal, and the "4" superscript indicates that 
the same chain of application of nuisance parameters is 
applied to the signal prediction as is applied to the back- 
ground. 

The likelihood is a function of the observed data 
D = {di}, the signal scale factors (3 — {/3s,/3t}, the nui- 
sance parameters 6 = {9m\ and r; = ?76,ifc}, the 
central values of the signal and background predictions 
s = {s°fc} and b = {^"^1, and the rate, shape, and bin- 
by-bin uncertainties p = k = 

" ^ \"b,ik'"bAk'"s,ik^"s,ikJ- 

L = L(p\(3,e,r],s,b,p,K,S). (27) 



B. Cross Section Measurement 

Because the signal template shapes and the tt back- 
ground template rates and shapes are functions of mt, 
we quote the single top quark cross section assuming a 
top quark mass of — 175 GeV/c^ and also evaluate 
dug^tl dnit. We therefore do not include the uncertainty 
on the top quark mass when measuring the cross section. 



1. Measurement of a s+t 

We measure the total cross section of single top quark 
production (Ts+t, assuming the SM ratio between s- 
channel and t-channel production: /3s = I3t = (3. We 
use a Bayesian marginalization technique [lOlj to incor- 
porate the effects of systematic uncertainty: 



L'(/3) = y L{T>\l3,e,r],s,b,p,K,S)7r{e)7r{T])dedT], 



(28) 



where the tt functions are the Bayesian priors assigned to 
each nuisance parameter. The priors are unit Gaussian 
functions centered on zero which are truncated when- 
ever the value of a nuisance parameter would result in a 
non-physical prediction. The measured cross section cor- 
responds to the maximum of L', which occurs at p™^^; 



2. Extraction of Bounds on \ Vtb\ 



(29) 



The uncertainty corresponds to the shortest interval 
[/3iow,/3high] containing 68% of the integral of the pos- 
terior, assuming a uniform positive prior in /3 7r(/3) — 1: 



0.68 = 



J^;^^-L'{l3)n{(3)dp 
J^^L'{(3)ni(3)d(3 ■ 



(30) 



This prescription has the property that the numerical 
value of the posterior on the low end of the interval is 
equal to that on the high end of the interval. 

Following the example of other top quark properties 
analyses, the single top quark cross section is measured 
assuming a top quark mass of 175 GeV/c^. This mea- 
surement is repeated with separate Monte Carlo sam- 
ples and background estimates generated with masses of 
170 GeV/c^ and 180 GeV/c^, and the result is used to 
find dus+t/drnt. 



The parameter 



's+t 
SM 
"s+t 



(31) 



is identified in the Standard Model as \Vtb 
assumption that \Vtd\'^ + |V(sp < \Vtb 



under the 
and that new 

physics contributions affect only |Vtfc|. The theoretical 
uncertainty on crfj^^ must be introduced for this calcula- 
tion. The 95% confidence lower limit on |Vth| is calculated 
by requiring < \Vtb\ < 1 and finding the point at which 
95% of the likelihood curve lies to the right of the point. 
This calculation uses a prior which is flat in |Vt;,p. 



C. Check for Bias 

As a cross-check of the cross-section measurement 
method, simulated pseudoexperiments were generated, 
randomly fluctuating the systematically uncertain nui- 
sance parameters, propagating their impacts on the pre- 
dictions of each signal and background source in each bin 
of each histogram, and drawing random Poisson pseudo- 
data in those bins from the fluctuated means. Samples 
of pseudoexperiments were generated assuming different 
signal cross sections, and the cross section posterior was 
formed for each one in the same way as it is for the data. 
We take the value of the cross section that maximizes 
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Input Cross Section (P) 



FIG. 35: Check of the bias of the cross-section measurement 
method using pseudoexperiments, for the super discriminant 
combination described in Section [X] The points indicate the 
median fit cross section, and the bands show the 68% and 
95% quantiles of the distribution of the fitted cross section, as 
functions of the input cross section. A fine is drawn showing 
equal input and fitted cross sections; it is not a fit to the 
points. 

the posterior as the best fit value, and calculate the total 
uncertainty on it in the same way as for the data. The 
resulting pull distribution is a unit Gaussian, provided 
that the input cross section for the pseudoexperiments is 
sufficiently far away from zero. 

Because the prior for the cross section does not allow 
negative values, the procedure described here cannot pro- 



where 9sm and ryg^ are the best-fit values of the nuisance 
parameters which maximize L given the data D, assum- 
ing the single top quark signal is present at its SM rate, 
and §0 and fjo are the best-fit values of the nuisance pa- 
rameters which maximize L assuming that no single top 
quark signal is present. These fits are employed not to 
incorporate systematic uncertainties, but to optimize the 
sensitivity. Fits to other nuisance parameters do not ap- 
preciably improve the sensitivity of the search and are 
not performed. Therefore, only the most important nui- 
sance parameters are fit for: the heavy-flavor fraction in 
M^-|-jets events and the mistag rate. 



duce a negative cross section measurement. For an input 
cross section of zero, half of the pseudoexperiments will 
have measured cross sections that are exactly zero, and 
the other half form a distribution of positive cross sec- 
tions. We therefore compare the median measured cross 
section with the input cross section of the pseudoexper- 
iments because the average measured cross section is bi- 
ased. Distributions of 68% and 95% of extracted cross 
sections centered on the median are shown as a function 
of the input cross section in Fig. [35l demonstrating that 
the measurement technique does not introduce bias for 
any value of the cross section used as input to the pseu- 
doexperiments. These checks were performed for each 
analysis; Figure [35] shows the results for the super dis- 
criminant combination, which is described in Section [X] 
Some nuisance parameters have asymmetric priors, and 
the inclusion of their corresponding systematic uncertain- 
ties will shift the fitted cross section. This is not a bias 
which must be corrected but rather it is a consequence 
of our belief that the values of the uncertain parameters 
are not centered on their central values. 

D. Significance Calculation 

The other goal of the search is to establish observa- 
tion of single top quark production. The significance is 
summarized by a p- value, the probability of observing an 
outcome of an experiment at least as signal-like as the 
one observed, assuming that a signal is absent. We fol- 
low the convention that a p- value less than 1.35 x 10"'^ 
constitutes evidence for a signal, and that a p-value less 
than 2.87 x 10"'' constitutes a discovery. These are the 
one-sided integrals of the tails of a unit Gaussian distri- 
bution beyond -i-3cr and +5a, respectively. 

We rank experimental outcomes on a one-dimensional 
scale using the likelihood ratio [o^l 



(32) 

I 

The desired p- value is then 

p = p(-21nQ<-21ngobs|s = 0), (33) 

since signal-like outcomes have smaller values of —2 In Q 
than background-like outcomes. Systematic uncertain- 
ties are included not in the definition of — 21n(5, which 
is a known function of the observed data and is not 
uncertain, but rather in the expected distributions of 
— 21n(5 assuming s = or s = ssm, since our expec- 
tation is what is uncertain. These uncertainties are in- 
cluded in a Bayesian fashion by averaging the distribu- 
tions of — 21n(5 over variations of the nuisance parame- 
ters, weighted by their priors. In practice, this is done 
by filling histograms of — 21n(5 with the results of sim- 



- 21nQ = -2 In 



£(D|/?, OsM, fjsM: ^ = ■SSM, b, p, K, S) 

L(D|/3, 00, 170, s = 0, b, p, K, (5) 
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ulated pseudoexperiments, each one of which is drawn 
from predicted distributions after varying the nuisance 
parameters according to their prior distributions. The fit 
to the main nuisance parameters insulates —2\nQ from 
the fluctuations in the values of the nuisance parameters 
and optimizes our sensitivity in the presence of uncer- 
tainty. 

The measured cross section and the p- value depend on 
the observed data. We gauge the performance of our 
techniques not based on the single random outcome ob- 
served in the data but rather by the sensitivity - the 
distribution of outcomes expected if a signal is present. 
The sensitivity of the cross section measurement is given 
by the median expected total uncertainty on the cross 
section, and the sensitivity of the significance calcula- 
tion is given by the median expected significance. The 
distributions from which these sensitivities are computed 
are Monte Carlo pseudoexperiments with all nuisance pa- 
rameters fluctuated according to their priors. Optimiza- 
tions of the analyses were based on the median expected 
p- values, without reference to the observed data. Indeed, 
the data events passing the event selection requirements 
were hidden during the analysis optimization. 

In the computation of the observed and expected p- 
values, we include all sources of systematic uncertainty 
in the pseudoexperiments, including the theoretical un- 
certainty in the signal cross sections and the top quark 
mass. Because the observed p-value is the probability of 
an upward fluctuation of the background prediction to 
the observed data, with the outcomes ordered as signal- 
like based on — 2 InQ, the observed p- value depends only 
weakly on the predicted signal model, and in particular, 
almost not at all on the predicted signal rate. Hence 
the inclusion of the signal rate systematic uncertainty 
in the observed p- value has practically no impact, and 
the shape uncertainties in the signal model also have lit- 
tle impact (the background shape uncertainties are quite 
important though). On the other hand, the expected p- 
value and the cross section measurement depend on the 
signal model and its uncertainties; a large signal is ex- 
pected to be easier to discover than a small signal, for 
example. 



X. COMBINATION 

The four analyses presented in Section I VIII each seek 
to establish the existence of single top quark production 
and to measure the production cross section, each using 
the same set of selected events. Furthermore, the same 
models of the signal and background expectations are 
shared by all four analyses. We therefore expect the re- 
sults to have a high degree of statistical and systematic 
correlation. Nonetheless, the techniques used to separate 
the signal from the background are different and are not 
guaranteed to be fully optimal for observation or cross 
section measurement purposes; the figures of merit op- 
timized in the construction of each of the discriminants 



are not directly related to either of our goals, but instead 
are synthetic functions designed to be easy to use during 
the training, such as the Gini function [9g used by the 
BDT analysis, and a sum of classification errors squared 
used by the neural network analysis. 

The discriminants all perform well in separating the 
expected signal from the expected background, and in 
fact their values are highly correlated, event to event, 
as is expected, since they key on much of the same in- 
put information, but in different ways. The coefficients 
of linear correlation between the four discriminants vary 
between 0.55 and 0.8, depending on the pair of discrim- 
inants chosen and the data or Monte Carlo sample used 
to evaluate the correlation. Since any invertible func- 
tion of a discriminant variable has the same separating 
power as the variable itself, and since the coefficients of 
linear correlation between pairs of variables change if the 
variables are transformed, these coefficients are not par- 
ticularly useful except to verify that indeed the results 
are highly, but possibly not fully, correlated. 

As a more relevant indication of how correlated the 
analyses are, pseudoexperiments are performed with fully 
simulated Monte Carlo events analyzed by each of the 
analyses, and the correlations between the best-fit cross 
section values are computed. The coefficients of linear 
correlation of the output fit results are given in Table |Vl 

TABLE V: Correlation coefficients between pairs of cross sec- 
tion measurements evaluated on Monte Carlo pseudoexperi- 
ments. 

LF ME NN BDT 
LP 1.0 0.646 0.672 0.635 
ME — 1.0 0.718 0.694 
NN — — 1.0 0.850 
BDT — — — 1.0 



The four discriminants, LF, ME, NN, BDT make use 
of different observable quantities as inputs. In particu- 
lar, the LF, NN, and BDT discriminants use variables 
that make assignments of observable particles to hypo- 
thetical partons from single top quark production, while 
the ME method integrates over possible interpretations. 
Furthermore, since the correlations between pairs of the 
four discriminants are different for the different physics 
processes, we expect this information also to be useful 
in separating the signal from the background processes. 
In order to extract a cross section and a significance, we 
need to interpret each event once, and not four times, in 
order for Poisson statistics to apply. We therefore choose 
to combine the analyses by forming a super discriminant, 
which is a scalar function of the four input discriminants, 
and which can be evaluated for each event in the data 
and each event in the simulation samples. The functional 
form we choose is a neural network, similar to that used 
in the 2.2 fb~^ single top quark combination at CDF |2y| 
as well as the recent H — WW search at CDF 103]. 
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The distributions of the super discriminant are used to 
compute a cross section and a significance in the same 
way as is done for the component analyses. 

In order to train, evaluate, and make predictions which 
can be compared with the observations for the super dis- 
criminant, a common set of events must be analyzed in 
the ME, NN, LF, and BDT frameworks. The discrimi- 
nant values are collected from the separate analysis teams 
for each data event and for each event simulated in Monte 
Carlo. Missing events or extra events in one or more 
analyses are investigated and are restored or omitted as 
discrepancies are found and understood. The M^-l-jets 
predictions in particular involve weighting Monte Carlo 
events by mistag probabilities and by generator lumi- 
nosity weights, and these event weights are also unified 
across four analysis teams. The procedure of making a 
super discriminant combination provides a strong level 
of cross checks between analysis teams. It has identified 
many kinds of simple mistakes and has required us to 
correct them before proceeding. All of these crosschecks 
were performed at the stage in which event data were ex- 
changed and before the training of the fnal discriminant, 
preserving the blindness of the result. 

We further take the opportunity during the combina- 
tion procedure to optimize our final discriminant for the 
goal that we set, that is, to maximize the probability 
of observing single top quark production. A typical ap- 
proach to neural network training uses a gradient descent 
method, such as back-propagation, to minimize the clas- 
sification error, defined by J2 i^i ~ where Oi is the 
output of the neural network and ti is the desired out- 
put, usually zero for background and one for signal. Al- 
though back-propagation is a powerful and fast technique 
for training neural networks, it is not necessarily true 
that minimizing the classification error will provide the 
greatest sensitivity in a search. The best choice is to 
use the median expected p- value for discovery of single 
top quark production as the figure of merit to optimize, 
but it cannot be computed quickly. Once a candidate 
network is proposed, the Monte Carlo samples must be 
run through it, the distributions made, and many mil- 
lions of pseudoexperiments run in order to evaluate its 
discovery potential. Even if a more lightweight figure 
of merit can be computed from the predicted distribu- 
tions of the signals and background processes, the step 
of reading through all of the Monte Carlo samples lim- 
its the number of candidate neural networks that can be 
practically considered. 

We therefore use the novel neural network training 
method of Neuro-Evolution, which uses genetic algo- 
rithms instead of back-propagation, to optimize our net- 
works. This technique allows us to compute an arbitrary 
figure of merit for a particular network configuration 
which depends on all of the training events and not just 
one at a time. The software package we use he re is Neuro- 
Evolution of Augmenting Topologies (neat) jl03l |. neat 
has the ability to optimize both the inter-node weights 
and the network topology, adding and rearranging nodes 



as needed to improve the performance. 

We train the neat networks using half of the events 
in each Monte Carlo sample, reserving the other half for 
use in predicting the outcomes in an unbiased way, and to 
check for overtraining. All background processes are in- 
cluded in the training except non-W because the non-M^ 
sample suffers from extremely low statistics. The output 
values are stored in histograms which are used for the 
figure of merit calculation. We use two figures of merit 
which are closely related to the median expected p- value, 
but which can be calculated much more quickly: 

"o-value" This figure of merit (so named because it is 
closely related to the expected p- value) is obtained 
from an ensemble of pseudoexperiments by taking 
the difference in the median of the test statistic 
— 21nQ for the background-only and signal plus 
background hypotheses, divided by the quadrature 
sum of the widths of those distributions: 



-21ng 



mod 
B 



21nQ 
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V(A21ngB)2 + (A21nQs+B) 



(34) 



Figure IMT c) shows the distributions of —2 In Q sep- 
arately for S+B and i?-only pseudoexperiments for 
the final network chosen. Typically, 2500 pseudo- 
experiments give a precision of roughly 1-2% and 
require one to two minutes to calculate. This is still 
too slow to be used directly in the evolution, but it 
is used at the end to select the best network from a 
sample of high-performing networks identified dur- 
ing the evolution. This figure of merit includes all 
rate and shape systematic uncertainties. 

Analytic Figure of Merit As a faster alternative to 
the figure of merit defined above, we calculate the 
quadrature sum of expected signal divided by the 
square root of the expected background {s/\/b) in 
each bin of each histogram. To account for the ef- 
fects of finite Monte Carlo statistics, this figure of 
merit is calculated repeatedly, each time letting the 
value of the expected signal and background pro- 
cesses fluctuate according to a Gaussian distribu- 
tion with a width corresponding to the Monte Carlo 
statistical error on each bin. The median of these 
trials is quoted as the figure of merit. This figure of 
merit does not include rate and shape systematic 
uncertainties. 

The network training procedure also incorporates an 
optimization of the binning of the histograms of the net- 
work output. In general, the sensitivity is increased by 
separating events into bins of different purity; combining 
the contents of bins of different purity degrades our abil- 
ity to test for the existence of the signal and to measure 
the cross section. Competing against our desire for fine 
gradations of purity is our need to have solid predictions 
of the signal and background yields in each bin with re- 
liable uncertainties - binning the output histogram too 
finely can result in an overestimate of the sensitivity due 
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to downward fluctuations in the Monte Carlo background 
predictions. Care is taken here, as described below, to 
allow the automatic binning optimization to maximize 
our sensitivity without overestimating it. 

The procedure, applied to each channel separately, is 
to first use a fixed binning of 100 bins in the neural net- 
work output from zero to one. The network output may 
not necessarily fill all 100 bins; different choices of net- 
work parameters, which are optimized by the training, 
will fill different subsets of these bins. To avoid prob- 
lems with Monte Carlo statistics at the extreme ends of 
the distributions, bins at the high end of the histogram 
are grouped together, and similarly at the low end, sac- 
rificing a bit of separation of signal from background for 
more robust predictions. At each step, the horizontal 
axis is relabeled so that the histogram is defined be- 
tween zero (lowest signal purity) and one (highest pu- 
rity). The bins are grouped first so that there are no 
bins with a total background prediction of zero. Next, 
we require that the histograms have a monotonically de- 
creasing purity as the output variable decreases from one 
towards zero. If a bin shows an anomalously high pu- 
rity, its contents are collected with those of all bins with 
higher network outputs to form a new end bin. Finally, 
we require that on the high-purity side of the histogram, 
the background prediction does not drop off too quickly. 
We expect In B oc In S for all x in the highest pu- 
rity region of the histogram. If the background decreases 
at a faster rate, we group the bins on the high end to- 
gether until this condition is met. After this procedure, 
we achieve a signal-to-background ratio exceeding 5:1 in 
the highest-discriminant output bins in the two-jet, one 
6-tag sample. 

The resulting templates and distributions are shown 
for all four selected data samples in Fig. 1311 In the com- 
parisons of the predictions to the data, the predictions are 
normalized to our signal and background models, which 
are described in Sections |V] and |TVl respectively. Each 
distribution is more sensitive than any single analysis. 



XI. ONE-DIMENSIONAL FIT RESULTS 

We use the methods described in Section HXl to extract 
the single top cross section, the significance of the excess 
over the background prediction, and the sensitivity, de- 
fined to be the median expected significance, separately 
for each component analysis described in Section I VIII 
and for the super discriminant combined analysis (SD), 
which is described in Section [X] The results are listed in 
Table I VII The cross section measurements of the indi- 
vidual analyses are quite similar, which is not surprising 
due to the overlap in the selected data samples. The mea- 
surements are only partially correlated, though, as shown 
in Table |Vl indicating that the separate analyses extract 
highly correlated but not entirely identical information 
from each event. 

Because the super discriminant has access to the most 



information on each event, and because it is optimized 
for the expected sensitivity, it is the most powerful single 
analysis. It is followed by the Neural Network (NN) and 
Boosted Decision Tree (BDT) analyses, and the Matrix 
Element (ME) analysis. The Likelihood Function (LF) 
analysis result in the table is shown only for the t-channel 
optimized likelihood functions, although the s-channel 
signals were included in the templates. 

A separate result, a measurement just of the s-channel 
signal cross section, is extracted from just the two-jet, 
two-6-tag LF analysis, assuming the t-channel signal 
cross section is at its SM value. The result thus ob- 
tained is cr^^ = l-5^Q g pb, with an observed significance 
of 2.0(7 and an expected significance of 1.1 cr. 

The super discriminant analysis, like the component 
analyses, fits separately the distributions of events in 
eight non-overlapping categories, defined by whether the 
events have two or three jets passing the selection re- 
quirements, one or two 6-tags, and whether the charged 
lepton was a triggered e or ^ candidate (TLC), as op- 
posed to a non-triggered extended muon coverage lepton 
candidate (EMC). A separate cross section fit is done for 
each of these categories, and the results are shown in Ta- 
ble IVIII The dominant components of the uncertainties 
are statistical, driven by the small data sample sizes in 
the most pure bins of our discriminant distributions. The 
cross sections extracted for each final state are consistent 
with each other within their uncertainties. 

The results described above are obtained from the 
i + ^T^+iets selection. An entirely separate analysis con- 
ducted by CDF is the search for single top quark events in 
the plus two- and three-jets sample 28] (MJ), which 
uses a data sample corresponding to 2.1 fb~^ of data. 
The events selected by the MJ analysis do not overlap 
with those described in this paper because the MJ anal- 
ysis imposes a charged lepton veto and an isolated high- 
Pt track veto. The MJ analysis separates its candidate 
events into three subsamples based on the 6-tagging re- 
quirements [28], and the results are summarized in Ta- 
ble EIIl 

The distributions of the super discriminant in the 
i + ^x+j6ts sample and the MJ neural network dis- 
criminant in the ^q^+iets sample are shown in Fig. [37l 
summed over the event categories, even though the cross 
section fits are performed and the significances are cal- 
culated separating the categories. The sums over event 
categories add the contents of bins of histograms with 
different s/b together and thus do not show the full sep- 
aration power of the analyses. Another way to show the 
combined data set is to collect bins with similar s/b in 
all of the channels of the SD and MJ discriminant his- 
tograms and graph the resulting distribution as a func- 
tion of log]^g(s/5), which is shown in Fig. [HSr aV This 
distribution isolates, at the high s/b side, the events that 
contribute the most to the cross section measurement 
and the significance. Figure ISSlb) shows the integral of 
this distribution, separately for the background predic- 
tion, the signal plus background prediction, and the data. 
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FIG. 36: Normalized templates (left) and plots comparing the predicted distributions with data (right) of the final combined 
neural network output for each selected data sample. These distributions are more sensitive than any single analysis. The data 
are indicated by points with error bars, and the predictions are shown stacked, with the stacking order following that of the 
legend. 
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The distributions are integrated from the highest s/h side 
downwards, accumulating events and predictions in the 
highest s/h bins. The data points are updated on the plot 
as bins with data entries in them are added to the inte- 
gral, and thus are highly correlated from point to point. 
A clear excess of data is seen over the background pre- 
diction, not only in the most pure bins, but also as the 
s/b requirement is loosened, and the excess is consistent 
with the standard model single top prediction. 

Because the I -\- -^x+j^ts sample and the -^x+j^ts 
sample have no overlapping events, they can be com- 
bined as separate channels using the same likelihood 
technique described in Section IIXI The joint poste- 
rior distribution including all eleven independent cate- 
gories simultaneously is shown in Figure IBQf a). From 
this distribution, we obtain a single top quark cross sec- 
tion measurement of Os+t — S.Slp'g pb, assuming a top 
quark mass of 175 GeW /<? . The dependence of the mea- 
sured cross section on the assumed top quark mass is 
d(Ts+t/dmt = +0.02 pb/(GeV/c2). Table EH] shows the 
results of fitting for ag and at in the separate jet, 6-tag, 
and lepton categories. The dominant source of uncer- 
tainty is the statistical component from the data sample 
size. Our best-fit single top quark cross section is ap- 
proximately one standard deviation below the Standard 
Model prediction of [1, [13]. The prediction of [ll| is 
somewhat higher, but it is also consistent with our mea- 
surement. 

To extract |Vib| from the combined measurement, we 
take advantage of the fact that the production cross sec- 
tion cTs+t is directly proportional to \Vtb\^ . We use the 
relation 

ITA |2 _incasurcd|TA |2 /_SM /qcN 

I I'tb I measured — ^s+t I '^t& I SM / "^s-l-t ) l-J^J 

where \Vtb\lM ~ 1 and crf^^ = 2.86 ± 0.36 I, [3. Equa- 
tion |35] further assumes that |Vtf,p » iVj^P + |Vtdp, be- 
cause we are assuming that the top quark decays to Wb 
100% of the time, and because we assume that the pro- 
duction cross section scales with |Vtf,p, while the other 
CKM matrix elements may contribute as well if they were 
not very small. We drop the "measured" subscripts and 
superscripts elsewhere. Figure ISHT b) shows the joint pos- 
terior distribution of all of our independent channels as 
a function of |Vtbp (which includes the theoretical un- 
certainty on the predicted production rate, which is not 
part of the cross section posterior) , from which we obtain 
\Vtb\ = 0.91 ± 0.11(stat.-Fsyst.)±0.07(theory) and a 95% 
confidence level lower limit of |Vtf,| > 0.71. 

We compute the p- value for the significance of this re- 
sult as described in Section IIXDI The distributions of 
— 21n(5 from which the p- value is obtained, are shown 
in Fig. Isgl c). We obtain a p-value of 3.1 x 10~^ which 
corresponds to a 4.985 standard deviation excess of data 
above the background prediction. We quote this to two 
significant digits as a 5.0 standard deviation excess. The 
median expected p- value is in excess of 5.9 standard de- 
viations; the precision of this estimate is limited by the 
number of pseudoexperiments which were fit. The fact 



that the observed significance is approximately one sigma 
below its SM expectation is not surprising given that 
our cross section measurement is also approximately one 
sigma below its expectation, although this relation is not 
strictly guaranteed. 

Recently, the cross section measurement shown here 
has been combined with that measured by DO [2^]. The 
same technique for extracting the cross section in combi- 
nation as for each individual measurement is used jl04{ , 
and the best-fit cross section is ag+t — 2.76^Q'4y pb, as- 
suming mt = 170 GeV/c^. 

XII. TWO-DIMENSIONAL FIT RESULTS 

The extraction of the combined signal cross sec- 
tion as+t proceeds by constructing a one-dimensional 
Bayesian posterior with a uniform prior in the cross sec- 
tion to be measured. An extension of this is to form 
the posterior in the two-dimensional plane, cts vs. at, 
and to extract the s-channel and the i-channel cross sec- 
tions separately. We assume a uniform prior in the as 
vs. at plane, and integrate over the nuisance parame- 
ters in the same way as we did for the one-dimensional 
cross section extraction. The input histograms for this 
extraction are the distributions of the super discriminant 
for the VF-|-jets analyses, and the MJ discriminant his- 
tograms are also included, exactly as is done for the one- 
dimensional cross section fit. 

The best-fit cross section is the one for which the pos- 
terior is maximized, and corresponds to as ~ 1.81q5 
and at = 0.8^q'4 pb. The uncertainties on the mea- 
surements of Cts and at are correlated with each other 
because s-channel and t-channel signals both populate 
the signal-like bins of each of our discriminant variables. 
Regions of 68.3%, 95.5%, and 99.7% credibility are de- 
rived from the distribution of the posterior by evalu- 
ating the smallest region in area that contains 68.3%, 
95.5% or 99.7% of the integral of the posterior. Each re- 
gion has the property that the numerical values of the 
posterior along the boundary of the region are equal 
to each other. The best-fit values, the credibility re- 
gions, and the SM predictions of as and at are shown in 
Fig.|40l We compare these with the NLO SM predictions 
of CTf = 1.98±0.25 pb and ct^ = 0.88±0.11 pb @,|T|, and 
also with the NNNLO predictions of at = 2.16 ± 0.12 pb 
and as = 0.98 ±0.04 pb [U]. 

The coverage of the technique is checked by gen- 
erating 1500 pseudo-datasets randomly drawn from 
systematically- varied predictions assuming that a single 
top signal is present as predicted by the SM, and per- 
forming the two-dimensional extraction of CTj and at for 
each one in the same way as is done for the data. No 
bias is seen in the median fit ct^ and at values. Each 
pseudo-dataset has a corresponding set of regions at 
68.3%, 95.5%, and 99.7% credibility. The fractions of the 
pseudo-datasets' fit bands that contain the input predic- 
tion for as and at is consistent with the credibility levels 
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TABLE VI: A summary of the analyses covered in this paper, with their measured cross sections, observed significances, and 
sensitivities, defined to be their median expected p-values, converted into Gaussian standard deviations. The analyses are 
combined into a super discriminant (SD), which is combined with the orthogonal ^rj,-\-jets sample (MJ) to make the final CDF 
combination. 



Analysis 


Cross Section Significance Sensitivity 




[Pb] 




[^] 


LF 




2.4 


4.0 


ME 


^•0-0.6 


4.3 


4.9 


NN 


1 Q+0.6 


3.5 


5.2 


BDT 


^•^-0.6 


3.5 


5.2 


SD 


9 1 +0.6 
^•J^-0.5 


4.8 


> 5.9 


MJ 


4 q+2.5 

^•^-2.2 


2.1 


1.4 


SD + MJ Combination 


9 O+0.6 
^•"^-0.5 


5.0 


> 5.9 




FIG. 37: Comparison of the predicted distributions with data summed over all selected data samples of the super discriminant 
(left) and the MJ discriminant (right). Points with error bars indicate the observed data, while the stacked, shaded histograms 
show the predictions, including a standard model single top signal. In each panel, the order of the stacked components follows 
that of the legend. 



at which the bands are quoted. 

The two-dimensional fit result is not in good agree- 
ment with the SM prediction; the difference is at ap- 
proximately the two standard deviation level of signifi- 
cance. The differences between the measured values of 
the s- and i-channel cross sections and their SM predic- 
tions are driven by the deficit of events observed in the 
high-discriminant output regions of the two-jet, one-5-tag 
channels relative to the SM signal-plus background pre- 
diction as shown in Fig. [35] (b), and the excess of events 
observed in the two-jet, two-b-tag distributions, as shown 
in Fig. [551(d). The measured total cross sections in these 
jet and 6-tagging categories, listed in Table I VIII show 
the effects of these discrepancies with respect to the SM 
predictions. 

The newer calculation of the f-channel kinematic dis- 
tributions (56l . \5T\ predicts a larger fraction of f-channel 
signal events with a visible recoiling 6 jet, which is nor- 



mally not reconstructed because it is beyond the forward 
acceptance of the detector or because the jet Et is too 
small. This calculation has almost the same overall cross 
section prediction for at as the one we use elsewhere in 
this paper [9|, but it reduces the two-jet, one 6-tag predic- 
tion for the t-channel signal and raises the two-jet two- 
6-tag and 3-jet predictions. After fully simulating and 
reconstructing the signal events, the effects on the pre- 
dicted yields are small; the 3-jet channels' contribution 
to our measurement sensitivity is also small. The change 
to the ID and 2D fit results is not noticeable when using 
the model of 56, 57] compared to our central prediction 
within the rounding precision of the results we quote. 

The t-channel process is sensitive to the b quark PDF 
of the proton, while the s-channel process is not. The 
low measured value of ct reported here is not in good 
agreement with the SM predictions. The DO collabora- 
tion has recently measured at = 3.14^Q gQ pb using a 
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FIG. 38: Distributions of data and predictions for the SD and MJ analyses, where bins of similar a/h have been coUected together 
(left). The points with error bars indicate the observed data, while the stacked, shaded histograms show the predictions, 
including a standard model single top signal. These distributions are integrated starting on the high-s/6 side, and the resulting 
cumulative event counts are shown on the right, separately for the observed data, for the background-only prediction, and the 
signal-plus-background prediction. 



TABLE VII: A summary of the measured values of the single 
top production cross section Os + ot using the super discrimi- 
nant analysis, separately for each of the non-overlapping final 
state categories, based on the number of jets, the number of 
h tags, and the lepton category. Also listed are the MJ cross 
section fit results by 6-tagging category. 



Category 


Cross Section [pb] 


SD 2- Jet, 1-Tag, TLC 


^- '-0.6 


SD 2- Jet, 2- Tag, TLC 


A 1+2-3 


SD 3- Jet, 1-Tag, TLC 




SD 3- Jet, 2- Tag, TLC 


6.311;^ 


SD 2- Jet, 1-Tag, EMC 


2.3+lt 


SD 2- Jet, 2- Tag, EMC 




SD 3- Jet, 1-Tag, EMC 


7 9-1-5.5 
' -^-4.6 


SD 3- Jet, 2- Tag, EMC 




SD 


9 1+0.6 
^•-■--0.5 


MJ 2- Tag 




MJ 1-Tag +JETPROB 


9 7-1-4.6 
^•'-2.7 


MJ 1-Tag 




MJ 




SD -1- MJ Combination 


9 O+0.6 
^•'-'-0.5 



data sample corresponding to 2.3 fb^^ of integrated lu- 
minosity 105], which is larger than the standard model 
prediction. Taken together, there is insufficient evidence 
to exclude a standard model explanation of the results. 



XIII. SUMMARY 

The observation of single top quark production poses 
many difficult experimental challenges. CDF performs 
this analysis in proton-antiproton collisions at 1.96 TeV 
in events with a leptonically decaying W boson and jets. 
The low signal-to-background ratio in the data samples 
passing our selection requirements necessitates precise 
modeling of the signal and background kinematic dis- 
tributions with matrix-element-based Monte Carlo gen- 
erators using full parton showering and detailed detec- 
tor simulation, and also requires the normalization of 
the dominant background rates to measured rates in 
sideband data samples. The small signals and large, 
uncertain background processes also require us to take 
maximum advantage of the expected kinematic and fla- 
vor differences between the signals and the background 
processes. We develop novel, powerful techniques for 
combining information from several observable quanti- 
ties computed for each event. We purify a subsample of 
single top quark events with a predicted signal to back- 
ground ratio exceeding 5:1 from a sample starting with 
a signal to background ratio of 1:16 after 6-tagging. 

Our final discriminant variables are functions of many 
kinematic and 6-tagging variables. Incorrect modeling of 
one or more variables, or even of the correlations between 
variables, can bias the results. We therefore evaluate an 
exhaustive list of systematic uncertainties which affect 
the predicted signal and background components' rates 
and kinematic distributions, including both theoretical 
uncertainties and uncertainties which arise from discrep- 
ancies observed between the data and the simulations in 
control regions. The correlations between the systematic 
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(a) (b) 



SD + MJ Combination SD ■•■ MJ Combination 




Test Statistic [-2ln(Q)] 



FIG. 39: The posterior curve of the cross section measurement calculated with the super discriminant histograms as inputs 
(a), the posterior curve for the |Vii,| calculation (b), and the distributions of —2 In Q in simulated S + B and B-only pseudoex- 
periments, assuming a Standard Model single top quark signal (c). The value of — 21nQ observed in the data is indicated with 
an arrow. 



uncertainties on the rate and shape predictions of the 
signal and background processes in several data samples 
are taken into account in all of the results and in com- 
puting the expected sensitivities presented in this paper. 
We also consider Monte Carlo statistical uncertainties in 
each bin of each template histogram in each channel in- 
dependently. We constrain the major background rates 
in situ in the selected event samples to further reduce the 
uncertainties in their values and to improve the sensitiv- 
ity of our results. 

Our analyses were optimized based on predictions and 
were blinded to the data during their development. The 
analyses were cross-checked using the data in control 
samples before looking at the data in the signal regions. 
We perform many checks of our methods - we compare 



the observed and predicted distributions of the discrim- 
inant input and output variables in independent control 
samples, and we also train discriminants that enrich sam- 
ples of each background as if it were signal. The vast 
majority of our cross checks show that the predictions 
model the data very well, and those that show discrep- 
ancies contribute to our systematic uncertainties. 

The four analyses in the £ + $T^+iets sample described 
in this paper are combined with a statistically indepen- 
dent analysis in the ^x+jsts sample p8| to maximize 
the total sensitivity. We report an observation of elec- 
troweak single top quark production with a p-value of 
3.1 X 10~^, which corresponds to a significance of 5.0 
standard deviations. The measured value of the com- 
bined s- and ^-channel cross section is (Ts+t — S.S^g'g pb 
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, SD + MJ Combination 

„ 5 I ' ' ' ' I ' ' ' ' I ' ' ' ' I ' ' ' ' I ' ' ' ' I ' ' ' ' I ' ' ' ' I ' ' ' ' I ' ' ' ' I ' ' ' 

Q. • CDF Data 

r □ 68.3% CL 

I 4 ^ □ 95.5% CL ^ 

u □ 99.7% CL 

^ ; ■ SM(NLO) 

o 3 r ■ SM(NNNLO) 
U 




s-channel Cross Section [pb] 



FIG. 40: The results of the two-dimensional fit for as and 
at- The black point shows the best fit value, and the 68.3%, 
95.5%, and 99.7% credibility regions are shown as shaded ar- 
eas. The SM predictions are also indicated with their the- 
oretical uncertainties. The SM predictions shown are those 
of (NLO) and [H (NNNLO). 

assuming the top quark mass is 175 GeV/c^, and also 
assuming the SM value of (Jsl<Jt- The dependence of the 
measured cross section on the assumed top quark mass 



is das+t/drrit = -1-0.02 pbc^/GeV. We extract a value of 
\Vtb\ = 0.91 ± 0.11(stat.-hsyst.)±0.07(theory) and a 95% 
confidence level lower limit of |V(6| > 0.71, using the pre- 
diction of '9',Iiq| for the SM cross section, and also assum- 
ing that |Vt6p > iV'tsP + |Vtdp. With a two-dimensional 
fit for Us and at , using the same combination of analyses 
as the one-dimensional fit, we obtain as = l.S^Q g pb and 
at = 0.81H pb. 
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